lecture notes intro to financial econ

Introduction to Financial Econometrics

Sebastiano Manzan

Introduction to Financial Econometrics

Sebastiano Manzan

©2015 Sebastiano Manzan

Contents

1. Getting Started with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Working with data in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Plotting the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 From prices to returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Distribution of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Creating functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Loops in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2. Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1 LRM with one independent variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Functional forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 The role of outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4 LRM with multiple independent variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5 Omitted variable bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3. Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1 The Auto-Correlation Function (ACF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 The Auto-Regressive (AR) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3 Model selection for AR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Forecasting with AR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.6 Trends in time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.7 Deterministic and Stochastic Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.8 Why non-stationarity is a problem for OLS? . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.9 Testing for non-stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.10 What to do when a time series is non-stationary . . . . . . . . . . . . . . . . . . . . . . . . . 61

4. Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.1 Moving Average (MA) and Exponential Moving Average (EMA) . . . . . . . . . . . . . . . . 634.2 Auto-Regressive Conditional Heteroskedasticity (ARCH) models . . . . . . . . . . . . . . . . 67

5. Measuring Financial Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.1 Value-at-Risk (VaR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2 Historical Simulation (HS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.3 Simulation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.4 VaR for portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5 Backtesting VaR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

1. Getting Started with RThe aim of the first chapter is to get you startedwith the basic steps of using R to analyze data, such as obtainingeconomic or financial data, uploading the data in R, and conducting a preliminary analysis by estimating themean and the standard deviation and plotting a histogram.We consider several ways of loading data in R, bothfrom a text file or downloading the data directly from the internet. R provides an extensive set of functionsto perform basic and also sophisticated analysis. One of the most important functions that you will use in R

is help(): by typing the name of the function within the brackets you obtain detailed information about thefunction’s arguments and its expected outcome. With hundred of functions used in a typical R session andeach having several arguments, even the most experienced R programmers ask for help().

1.1 Working with data in R

The first task is to load some data to analyze in R and there are two ways to do this. One is to upload a file thatis stored in your computer as a text file (e.g., comma separated values or csv) using the read.csv() command.The file sp500_1970.csv is the monthly time series of the S&P 500 Index from January 1970 until December2013 and can be uploaded in R with the command sp500 = read.csv('sp500_1970.csv'). The sp500 objectrepresents a R data frame which does not have any time series connotation. However, since the data filerepresents a time series it is convenient to define sp500 as a time series object which makes some tasks easier(e.g., plotting). I will use the zoo package to define sp500 as a time series and we need to provide the functionwith the starting and/or ending date in addition to the frequency of the data. However, we first need to loadthe package in the R environment that can be done with the command require(zoo). We can now redefinethe object from data frame to zoo as follows: sp500 = zooreg(sp500, start=c(1970,1), frequency=12),and sp500 is now a zoo object that starts in January 1970 with frequency 12 since the data are 12 monthlyobservations in a year.

If we do not have the data stored in our computer, we can then simply download the data from some dataproviders, such as Yahoo Finance for financial data and from the Federal Reserve Economic Database FREDfor US and international macroeconomic data. There are several packages that provide functions to do this:

• package tseries has function get.hist.quote() to download data from Yahoo• package quantmod has function getSymbols() to download data from Yahoo and some other providers• package fImport has functions yahooSeries() and fredSeries() to download from Yahoo and FRED

First, you need to load the package with the require() command (in case the package is not installed in yourmachine, you need to use the command install.packages()). For example, to download data for the S&P500 Index (ticker: ˆGSPC) at the monthly frequency using the get.hist.quote() function you would type thefollowing command:

1

http://cran.r-project.org/web/packages/zoo/index.html

http://finance.yahoo.com

http://research.stlouisfed.org/fred2/

http://cran.r-project.org/web/packages/tseries/index.html

http://cran.r-project.org/web/packages/quantmod/index.html

http://cran.r-project.org/web/packages/fImport/index.html

Getting Started with R 2

require(tseries)

sp500 <- get.hist.quote("^GSPC", start="1970-01-01",end="2015-08-31", quote="AdjClose",

provider="yahoo", compression="m", quiet=TRUE)

where sp500 is defined as a zoo object and the arguments of the function are (see help(get.hist.quote) formore details):

1. the ticker of the stock or index that you want to download2. start and end dates in the format “year-mm-dd”3. quote can be “Open”,”High”, “Low”, “Close”; if left unspecified the function downloads all of them4. compression can be “d” for daily, “w” for weekly, or “m” for monthly

An alternative to using get.hist.quote() is the getSymbols() function from the quantmod package whichhas the advantage to allow downloading data for several symbols simultaneously. For example, if we areinterested in obtaining data for Apple Inc. (ticker: AAPL) and the S&P 500 Index from January 1970 to August2015 we could run the following lines of code:

require(quantmod)

getSymbols(c("AAPL","^GSPC"), src="yahoo", from='1990-01-02',to='2015-08-31')

[1] "AAPL" "GSPC"

Notice that the getSymbols() function does not require you to specify the frequency of the data, but ituses the daily frequency by default. To subsample to lower frequencies, the package provides the functionsto.weekly() and to.monthly() that convert the series (from daily) to weekly or monthly. Notice also thatthe getSymbols() function creates as many objects as tickers and each will contain the open, high, low, close,adjusted close price and the volumewhich are zoo objects. The quantmod package provides functions to extractinformation from these objects, such as Ad(AAPL) that returns the adjusted closing price of the AAPL object(similarly, the functions Op(x), Hi(x), Lo(x), Cl(x), Vo(x) extract the open, high, low, close price and volume).The package also has functions such as LoHi(x)which creates a series with the difference between the highestand lowest intra-day price. Another useful function is ClCl(x) which calculates the daily percentage changeof the closing price in day t relative to the closing in day t− 1.

The third possibility is to use the fImport package which has function yahooSeries() that can downloadseveral tickers for a specified period of time and frequency. An example is provided below:

require(fImport)

data <- yahooSeries(c("AAPL", "^GSPC"), from="2003-01-01",

to="2015-08-31", frequency="monthly")

In addition, the package fImport also provides the function fredSeries() which allows to download datafrom FRED which includes thousands of macroeconomic and financial series for the U.S. and other countries.Similarly to the ticker of a stock, you need to know the FRED symbol for the variable that you are interestedto download by visiting the FRED webpage. For example, UNRATE is the civilian unemployment rate forthe US, CPIAUCSL is the Consumer Price Index (CPI) for all urban consumers, and GDPC1 is the real GrossDomestic Product (GDP) in billions of chained 2009 dollars. We can download the three macroeconomicvariables together as follows:

http://research.stlouisfed.org/fred2/


require(fImport)

options(download.file.method="libcurl")

macrodata = fredSeries(c('UNRATE','CPIAUCSL','GDPC1'), from="1950-01-02", to="2015-08-31")

1.2 Plotting the data

Once we have loaded a dataset in R we might want to find the length of the time series, the number ofvariables, or by simply looking at the data. If the object contains one time series and we want to find itslength, we use the command length(sp500)which returns 548 that is the number of monthly observations inthe object. In the case of the macrodata object created above, it has length equal to 2358 since it is composedof a data frame with 3 columns and 786 rows. This can be easily verified by enquiring the dimension of theobject with the following command:

dim(macrodata)

[1] 786 3

Next, we might want to look at the data using the head() and tail(), commands which provide the first andlast 5 observations:

head(sp500)

AdjClose

1970-01-02 85.0

1970-02-02 89.5

1970-03-02 89.6

1970-04-01 81.5

1970-05-01 76.6

1970-06-01 72.7

tail(macrodata)


GMT

UNRATE CPIAUCSL GDPC1

2015-02-01 5.5 235 NA

2015-03-01 5.5 236 NA

2015-04-01 5.4 236 16324

2015-05-01 5.5 237 NA

2015-06-01 5.3 238 NA

2015-07-01 5.3 238 NA

since these objects are defined as time series, the function also prints the date of each observation. Notice thatin the macro-dataset we have many NA that represent missing values for the GDP variable. This is due to thefact that the unemployment rate and the CPI Index are available monthly, whilst Real GDP is only availableat the quarterly frequency. The object can be converted to the quarterly frequency using the to.quarterly()function in the quantmod package as follows: to.quarterly(as.zooreg(macrodata),OHLC=FALSE).

Plotting the data is a useful way to learn about the behavior of a series over time and this can be done veryeasily in R using the function plot(). For example, a graph of the S&P 500 Index from 1970 to 2013 canbe easily generated with the command plot(sp500) which produces the following graph. Since we definedsp500 as a time series object (more specifically, as a zoo object), the x-axis represents time without any furtherintervention by the user.

For many economic and financial variables that display exponential growth over time, it is often convenient toplot the log of the variable rather than its level. This has the additional advantage that differences between thevalues at two points in time represent an approximate percentage change of the variable in that period of time.This can be achieved by plotting the natural logarithm of the variable with the command plot(log(sp500),

xlab="Time",ylab="S&P 500 Index"):


Notice that we added two more arguments to the plot() function to personalize the labels on the axis, insteadof the default values (compare the two graphs).

1.3 From prices to returns

Financial analysis is typically interested in analyzing the return rather than the price of an asset. Let’s definethe price of the S&P 500 in month t by Pt. There are two types of returns that we are interested in calculating:

1. Simple return: Rt = (Pt − Pt−1)/Pt−1

2. Logarithmic return: rt = log(Pt)− log(Pt−1) (or continuously compounded)

Creating the returns from price series in R is simple, thanks to the fact that we defined the data as time seriesobjects (e.g., zoo) which have functions such as lag() that provide lead/lag values of a time series (i.e., Pt−1).We can thus create the simple and logarithmic returns of the S&P 500 Index with the following commands:

simpleR = 100 * (sp500-lag(sp500, -1))/lag(sp500,-1)

logR = 100 * (log(sp500) - lag(log(sp500), -1))

where we multiply by 100 to obtain percentage returns. An alternative way to calculate these returns is byusing the diff() function which takes the first difference of the time series, that is,∆Pt = Pt −Pt−1. Usingthe diff() function we would calculate returns as follows:

simpleR = 100 * diff(sp500) / lag(sp500, -1)

logR = 100 * diff(log(sp500))

A first step in data analysis is to calculate descriptive statistics that summarize the main features of thedistribution of the data, such as the average/median returns and their dispersion. One way to do this is byusing the summary() function which provides the output shown below:


summary(simpleR)

Index AdjClose

Min. :1970-02-02 Min. :-21.76

1st Qu.:1981-06-16 1st Qu.: -1.85

Median :1992-11-02 Median : 0.94

Mean :1992-10-31 Mean : 0.68

3rd Qu.:2004-03-16 3rd Qu.: 3.60

Max. :2015-08-03 Max. : 16.30

summary(logR)

Index AdjClose

Min. :1970-02-02 Min. :-24.54

1st Qu.:1981-06-16 1st Qu.: -1.87

Median :1992-11-02 Median : 0.93

Mean :1992-10-31 Mean : 0.58

3rd Qu.:2004-03-16 3rd Qu.: 3.53

Max. :2015-08-03 Max. : 15.10

You may notice that the mean, median, and 1st and 3rd quartile (25% and 75%) are quite close values but theminimum (and the maximum) are quite different: for the simple return the maximum drop is -21.763% andfor the logarithmic return the maximum drop is -24.543%. The reason for this is that the logarithmic return isan approximation to the simple return that works well when the returns are small but becomes increasinglyunreliable for large (positive or negative) returns.

An advantage of using logarithmic returns is that it simplifies the calculation of multiperiod returns. This isdue to the fact that the (continuously compounded) return over k periods is given by rkt = log(Pt)−log(Pt−k)which can be expressed as the sum of one-period logarithmic returns, that is

rkt = log(Pt)− log(Pt−k) =

k∑j=1

rt−j+1

Instead, for simple returns the multi-period return would be calculated asRkt =

∏kj=1(1+Rt−j+1)−1. One

reason to prefer logarithmic to simple returns is that it is easier to derive the properties of the sum of randomvariables, rather than their product. The disadvantage of using the continuously compounded return is thatwhen calculating the return of a portfolio the weighted average of log returns of the individual assets is onlyan approximation of the log portfolio return. However, at the daily and monthly horizons returns are verysmall and thus the approximation error is relatively minor.

Descriptive statistics can also be obtained by individual commands that calculate the mean(), sd() (standarddeviation), median(), and empirical quantiles (quantile( , tau) with tau a value between 0 and 1). Thepackage fBasics provides additional functions such as skewness() and kurtosis() which are particularlyrelevant in the analysis of financial data. This package has also a function basicStats() that provides a tableof descriptive statistics as follows (see help(basicStats) for details):

http://cran.r-project.org/web/packages/fBasics/index.html


require(fBasics)

basicStats(logR)

AdjClose

nobs 547.000

NAs 0.000

Minimum -24.543

Maximum 15.104

1. Quartile -1.872

3. Quartile 3.534

Mean 0.576

Median 0.932

Sum 315.183

SE Mean 0.190

LCL Mean 0.204

UCL Mean 0.948

Variance 19.644

Stdev 4.432

Skewness -0.714

Kurtosis 2.628

In addition, in the presence of several assets we might be interested in calculating the covariance andcorrelation among these assets. Let’s define Ret to have two columns representing the return of Apple inthe first column, and the return of the S&P 500 Index in the second column. We can use the functions cov()and cor() to estimate the covariance and correlation as follows:

cov(Ret, use='complete.obs')

aapl sp500

aapl 193.8 24.1

sp500 24.1 18.5

Where the elements in the diagonal are the variances of the Apple and S&P 500 returns and the off-diagonalelement is the covariance between the two series (the off-diagonal elements are the same bebause thecovariance between X and Y is the same as the covariance between Y and X). The correlation matrix iscalculated as:

cor(Ret, use='complete.obs')

aapl sp500

aapl 1.000 0.402

sp500 0.402 1.000

where the diagonal elements are equal to 1 because it represents the correlation of X with X and the off-diagonal element is the correlation between the two returns.

We can now plot the monthly simple and logarithmic return as follows:


plot(simpleR)

plot(logR, col=2, xlab="Time", ylab="S&P 500 Return")

abline(h=0, col=4, lty=2, lwd=2)

where the abline() command produces a horizontal line at 0 with a certain color (col=4 is blue), type of line(lty), and width (lwd).

1.4 Distribution of the data

Another useful tool to analyze data is to look at frequency plots which can be interpreted as estimators ofthe probability density function. Histograms represent a popular way to visual the sample frequency that a


variable takes values in a certain bin/interval. The function hist() serves this purpose and can be used asfollows:

hist(logR, breaks=50, main="S&P 500")

This is a basic histogram in which we set the number of bins to 50 and the title of the graph to “Apple”. Wecan also add a nonparametric estimate of the frequency that smooths out the roughness of the histogram andmakes the density estimates continuous (N.B.: the prob=TRUE option makes the y-scale probabilities insteadof frequencies):

hist(logR, breaks=50,main="S&P 500",xlab="Return",ylab="",prob=TRUE)

lines(density(logR,na.rm=TRUE),col=2,lwd=2)

box()


1.5 Creating functions in R

So far we discussed functions that are already available in R, but the advantage of using a programminglanguage is that you are able to write functions that are specifically taylored to the analysis you are planningto conduct. We will illustrate this with a simple example. Earlier, we calculate the average monthly returnof the S&P 500 Index using the command mean(logR) that is equal to 0.576. We can write a function thatcalculates the mean and compares the results with those of the function mean(). Since the sample mean isobtained as R =

∑Tt=1Rt/T we can write a function that takes a time series as input and gives the sample

mean as output. We can call this function mymean and the syntax of defining a function is as follows:

mymean <- function(Y)

{

Ybar <- sum(Y) / length(Y)

return(Ybar)

}

mymean(logR)

## [1] 0.576

Not surprisingly, the result is the same as the one obtained using the mean function. More generally, a functioncan take several arguments, but it has to return only one outcome, which could be a list of items. The functionwe defined above is quite simple and it has several limitations: 1) it does not take into account that the seriesmight have NAs, and 2) it does not calculate the mean of each column in case there are several. As an exercise,modify the mymean function to accomodate for these issues.

1.6 Loops in R

A loop consists of a set of commands that we are interested to repeat a pre-specified number of times and tostore the results for further analysis. There are several types of loops, with the for loop probably the mostpopular. The syntax in R to implement a for loop is as follows:

for (i in 1:N)

{

## write your commands here

}

where i is an indicator and N is the number of times the loop is repeated. As an example, we can write afunction that contains a loop to calculate the sum of a variable and compare the results to the sum() functionprovided in R. This function could be written as follows:


mysum <- function(Y)

{

N = length(Y) # define N as the number of elements of Y

sumY = 0 # initialize the variable that will store the sum of Y

for (i in 1:N)

{

sumY = sumY + as.numeric(Y[i]) # current sum is equal to previous sum

} # plus the i-th value of Y

return(sumY) # as.numeric(): makes sure to transform

} # from other classes to a number

c(sum(logR), mysum(logR))

## [1] 315 315

Notice that to define the mysum() function we only use the basic + operator and the for loop. This is just asimple illustration of how the for loop can be used to produce functions that perform a certain operationon the data. Let’s consider another example of the use of the for loop that demonstrates the validity of theCentral Limit Theorem (CLT). We are going to do this by simulation, which means that we simulate data andcalculate some statistic of interest and repeat these operations a large number of times. In particular, we wantto demonstrate that, no matter how the data are distributed, the sample mean is normally distributed withmean the population mean and variance given by σ2/N , where σ2 is the population variance of the data andN is the sample size. We assume that the population distribution is N(0, 4) and we want to repeat a largenumber of times the following operations:

1. Generate a sample of length N

2. Calculate the sample mean3. Repeat 1-2 S times

Every statistical package provides functions to simulate data from a certain distribution. The function rnorm(

N, mu, sigma) simulate N observations from the normal distribution with mean mu and standard deviationsigmawhilst rt(N, df, ncp) generates a sample of length N from the t distributionwith df degrees-of-freedomand non-centrality parameter ncp. The code to perform this simulation is as follows:

S = 1000 # set the number of simulations

N = 1000 # set the length of the sample

mu = 0 # population mean

sigma = 2 # population standard deviation

Ybar = vector('numeric', S) # create an empty vector of S elements

# to store the t-stat of each simulation

for (i in 1:S)

{

Y = rnorm(N, mu, sigma) # Generate a sample of length N

Ybar[i] = mean(Y) # store the t-stat

}

c(mean(Ybar), sd(Ybar))


[1] -0.000337 0.062712

The object Ybar contains 1000 elements each representing the sample mean of a random sample of length 1000drawn from a certain distribution. We expect that these values are distributed as a normal distribution withmean equal to 0 (the population mean) and standard deviation 2/31.623 = 0.063. We can assess this by plottingthe histogram of Ybar and overlap it with the distribution of the sample mean. The graph below shows thatthe two distribution seem very close to each other. This is confirmed by the fact that the mean of Ybar and itsstandard deviation are both very close to their expected values. To evaluate the normality of the distribution,we can estimate the skewness and kurtosis of storewhich we expect to be close to zero to indicate normality.These values are -0.042 and -0.199 which can be considered close enough to zero to conclude that Ybar isnormally distributed.

What would happen if we generate samples from a t instead of a normal distribution? For a small numberof degrees-of-freedom the t distribution has fatter tails than the normal, but the CLT is still valid and weshould expect results similar to the previous ones. We can run the same code as above, but replace the line Y= rnorm(N, mu, sigma) with Y = rt(N, df) with df=4. The plot of the histogram and normal distribution(with σ2 = df/(df − 2)) below shows that the empirical distribution of Ybar closely tracks the asymptoticdistribution of the sample mean.

2. Linear Regression ModelIn this Chapter we review the Linear Regression Model (LRM) in the context of some relevant financialapplication and discuss the R functions that are used to estimate the model. A more detailed discussion ofof the LRM and the methods to conduct inference can be found in the introductory textbooks of @sw and@wool.

2.1 LRM with one independent variable

The LRM assumes that Yt = β0+β1∗Xt+ϵt where Yt is called the dependent variable,Xt is the independentvariable, β0 and β1 are parameters, and ϵt is the error term. Typically, we use subscript t when we have timeseries data and subscript i for cross-sectional data (e.g., different individuals, firms, countries). The aim ofthe LRM is to explain the variation over time/units of the dependent variable Y based on the variation ofthe independent variable X : high values of Y are explained by high or low values of X, depending on theparameter β1. We estimate the LRM by Ordinary Least Squares (OLS) which is a method that chooses theparameter estimates that minimize the sum of squared residuals. For the LRM we have analytical formulasfor the OLS coefficient estimates. The estimate of the slope coefficient is given by β1 =

σXt,Yt

σ2Xt

= ρXt,Yt

σYtσXt

where σ2Xt

and σ2Yt

represent the sample variances of two variables, and σXt,Yt and ρXt,Yt are the sample

covariance and the correlation ofXt and Yt, respectively. The intercept is given by β0 = Y − β1 ∗ X whereX and Y represent the sample mean ofXt and Yt. Let’s assume that aaplret represents the monthly excessreturn of Apple which we consider our dependent variable, and that sp500ret is the monthly excess returnof the S&P 500. To calculate the estimate of the slope coefficient β1 we need to first estimate the covarianceof the stock and the index, and the variance of the index return. The R commands to estimate these quantitieswere introduced in the previous Chapter, so that β1 can be calculated as follows:

cov(aaplret, sp500ret) / var(sp500ret)

[1] 1.301034

The interpretation of the coefficient estimate is that if the S&P 500 changes by 1% then we expect that theApple stock prices changes by 1.3010343%. The alternative way to calculate β1 is using the formula with thecorrelation coefficient and the ratio of the standard deviations of the two assets:

cor(aaplret, sp500ret) * sd(aaplret) / sd(sp500ret)

[1] 1.301034

Once the slope parameter is estimated, we can then estimate the intercept as follows:

14

Linear Regression Model 15

mean(aaplret) - beta1 * mean(sp500ret)

[1] 0.7095597

Naturally, R has functions that estimate the LRM automatically and provide a wealth of information. Thefunction is called lm() for linear model and below is the way it is used to estimate the LRM:

fit <- lm(aaplret ~ sp500ret)

fit

Call:

lm(formula = aaplret ~ sp500ret)

Coefficients:

(Intercept) sp500ret

0.7096 1.3010

The fit object provides the coefficient estimates and the function that has been used; a richer set of statisticsis provided by the function summary(fit) as shown below:

Call:

lm(formula = aaplret ~ sp500ret)

Residuals:

Min 1Q Median 3Q Max

-79.660 -5.796 0.777 7.725 32.215

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.7096 0.7598 0.934 0.351

sp500ret 1.3010 0.1753 7.420 1.34e-12 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.77 on 286 degrees of freedom

Multiple R-squared: 0.1614, Adjusted R-squared: 0.1585

F-statistic: 55.06 on 1 and 286 DF, p-value: 1.34e-12

The table provided by the summary() function provides standard errors of the coefficient estimates, the t-statistics and p-values for the null hypothesis that the coefficient is equal to zero (and stars to indicate thesignificance at 1, 5, and 10%), the R2 and its adjusted version, and the F test statistic for the overall significanceof the regression. Interpreting themagnitude of theR2 is always related to what can be expected in the specificcontext that is being analyzed. In this case, a 14% is not a very high number and there is a lot of variability ofApple returns that is not explained by a linear relationship with the market. This becomes evident when weplot the time series of the residuals:


plot(fit$residuals)

abline(h=0, col=2, lty=2)

It is clear from this graph that the residuals are large in absolute magnitude (see the scale of the y-axis) andsome large negative residuals that are probably due to company-specific news hitting the market. We canalso evaluate the normality of the residuals graphically using a Quantile-Quantile (QQ) plot that representsa scatter plot of the empirical quantiles of the time series against the quantiles of a normal distribution withmean and variance estimated from the time series.

qqnorm(fit$residuals) # Quantile-Quantile function

qqline(fit$residuals,col=2) # diagonal line

The con-tinuous line in the graph represents the diagonal and we expect the QQ points to be along or close to the lineif the residuals are normally distributed. Instead, in this case we observe large deviations from the diagonal


on the left tail and some smaller deviations on the right tail of the residuals distribution. The dot on thebottom-left corner has coordinates approximately equal to -3% and -80%: this means that there is only 0.1%probability that a normal distribution with the estimated mean and variance takes values smaller than -3%. Onthe other hand, we observe that for Apple that same quantile is -80%, which indicates that the residuals of themodel have fatter tails relative to the normal distribution. This is because the monthly returns of Apple havesome extreme events that are not explained by exposure of the stock to market risk (β1 ∗Xt). This produceslarge unexplained returns that might have occurred because of the release of firm-specific news (rather thanmarket-wide) which depressed the stock price in that specific month. In a later Section we discuss the effectof outliers on OLS coefficient estimates.

2.2 Functional forms

In the example above we consider the case of a linear relationship between the independent variable X andthe dependent variable Y. However, there are situations in which the relationship between X and Y might notbe well-explained by a linear model. This can happen when the effect on Y of changes in X depend on thelevel of X. In this Section, we discuss two functional forms that are relevant in some financial applications.

One functional form that can be used to account for nonlinearities in the relationship between X and Y isthe quadratic model, which simply consists of adding the square of the independent variable as an additionalregressor. The Quadratic Regression Model is given by Yt = β0 + β1 ∗ Xt + β2 ∗ X2

t + ϵt that, relative tothe linear model, adds some curvature to the relationship through the quadratic term. The model can still beestimated by OLS and the expected effect of a one unit increase in X is now given by β1 + 2 ∗ β2 ∗Xt. Inthis case, the effect on Y of changes inX is thus a function of the level of the independent variable X, whilein the linear model the effect is β1 whatever the value ofX is.

Another simple variation of the LRM which introduces nonlinearities consists of assuming that the slopecoefficient of X is different below or above a certain threshold value of X . For example, if Xt is the marketreturn we might want to evaluate if there is a different (linear) relationship between the stock and the marketreturn when the market return is, e.g., below the mean/median as opposed to above the mean/median. Wecan define two new variables as follows: XUP

t = Xt ∗ I(Xt ≥ m) and XDOWNt = Xt ∗ I(Xt < m),

where I(A) is the indicator function which takes value 1 if the event A is true and zero otherwise, andm isa threshold value to define the variable above and below.

An interesting application of nonlinear functional forms is to model hedge fund returns. As opposed to mutualfunds that mostly buy and sell stocks and make limited use of financial derivatives and leverage, hedge fundsmake extensive use of these instruments to hedge downside risk and boost their returns. This produces anonlinear response to changes in market returns which might not be well captured by a linear model. Asimple example to illustrate why this could be the case, assume that a hedge funds holds a portfolio composedof one call option on a certain stock. The performance/payoff is clearly nonlinearly related to the price of theunderlying asset and will be better approximated by either of the functional forms discussed above. We canempirically investigate this issue using the Credit Suisse Hedge Fund Indexes that provide the overall andstrategy-specific performance of a portfolio of hedge funds. Indexes are available for the following strategies:convertible arbitrage, dedicated short bias, equity market neutral, event driven, global macro and long-shortequity among others. Since individual hedge fund returns data are proprietary, we will use these Indexes thatcould be interpreted as a sort of fund-of-funds of the overall universe or for specific strategies of hedge funds.

The file hedgefunds.csv contains the returns of 14 Credit Suisse Hedge Fund Indexes (the HF index and 13

http://www.hedgeindex.com


strategies indexes) from January 1993 until December 2013 at the monthly frequency. We start the analysisby considering the HF Index which represents the first column of the file. Below we show a scatter plot of theHF Index return against the S&P 500. There seems to be positive correlation between these indexes, althoughthe most striking feature of the plot is the difference in scale between the x- and y-axis: the HF returns rangebetween ±6% while the equity index between −20% and 12%. The standard deviation of the HF index is2.111% compared to 4.444% for the S&P 500 Index which shows that hedge funds, in general, provide an hedgeagainst large movements in markets (the S&P 500 in this case).

plot(sp500, hfindex, main="Credit Suisse Hedge Fund Index", cex.main=0.75)

abline(v=0, col=2)

abline(h=0, col=2)

Before introducing non-linearities, we estimate a linear model in which the HF index return is explained bythe S&P 500 return. The results below show the existence of a statistically significant relationship between thetwo returns, with a 0.274 exposure of the HF return to the market return (if the market return changes by±1% then we expect the fund return to change by± 0.274%). The R2 of the regression is 0.3314986, which is notvery high and might indicate that a nonlinear model might be more successful to explain the time variation ofthe hedge fund returns. We can add the fitted linear relationship 0.564 + 0.274 * sp500 to the previous scatterplot to have a graphical understanding of the LRM.

fitlin <- lm(hfindex ~ sp500)


(Intercept) 0.564 0.113 5.008 0

sp500 0.274 0.025 10.864 0


Let’s consider now the quadratic model in which we add the square of the market return:

sp500sq <- sp500^2

fitquad <- lm(hfindex ~ sp500 + sp500sq)


(Intercept) 0.715 0.133 5.357 0.000

sp500 0.254 0.027 9.536 0.000

sp500sq -0.007 0.003 -2.072 0.039

If the coefficient of the square term is statistically significant then we conclude that a nonlinear form is abetter model for the relationship between market and fund returns. In this case, we find that the p-value forthe null hypothesis that the coefficient of the square market return is equal to zero is 0.039 (or 3.9%) whichis smaller than 0.10 (or 10%) and thus find evidence of nonlinearity in the relationship between the HF andthe market return. In the plot below we see that the contribution of the quadratic term (dashed line) is moreapparent at the extremes, while for small returns it overlaps with the fitted values of the linear model. Interms of goodness-of-fit, the adjusted R2 of the quadratic regression is 0.338 compared to 0.329 for the linearmodel, which is a modest but significant increase that makes the quadratic model preferable.


The second type of nonlinearity we discussed earlier is to assume the relationship is linear but with differentslopes below and above a certain threshold. In the example belowwe consider as threshold themedian value ofthe independent variable, which in this case is the market return. We then create two variables that representthe market return below the median (sp500down) and above the median (sp500up). These two variables canthen enter the lm() command and the OLS estimation results are reported below.

m <- median(sp500)

sp500up <- sp500 * (sp500 >= m)

sp500down <- sp500 * (sp500 < m)

fitupdown <- lm(hfindex ~ sp500up + sp500down)


(Intercept) 0.718 0.171 4.209 0

sp500up 0.222 0.050 4.486 0

sp500down 0.313 0.042 7.544 0

The slope coefficient (or market exposure) above the median is 0.222 and below is 0.313, which suggests thatthe HF Index is more sensitive to downward movements of the market relative to upward movements. Weconclude that there is evidence of a significant difference in the exposure of HF returns to positive/negativemarket conditions, although it is not very strong. The adjusted R2 of 0.3299409 is slightly higher than thelinear model but lower relative to the quadratic model. The graph below shows the fitted regression line forthis model.


2.3 The role of outliers

One of the indexes provided by Credit Suisse is for the equity market neutral strategy. The aim of this strategyis to provide positive expected return, with low volatility and no correlation with the equity market. Belowis a scatter plot of the market return (proxied by the S&P 500) and the HF equity market neutral return.

The problem with this graph is that the scale of the y-axis ranges between -40% and 10% and all of the HFreturns, except for one month, happen on the much smaller range between ±10%. The extreme observationthat skews the graph corresponds to a month in which the market index lost 7.5% and the equity marketneutral index lost over 40%. To find out when the extreme observation occurred, we can use the commandwhich(hfneutral < -30) which indicates that it represents the 179th observation and corresponds to Nov2008. What happened in Nov 2008 to create a loss of 40% to an aggregate index of market neutral strategyhedge funds? In the following press release Credit Suisse discusses that they marked down to zero the assets of

http://www.hedgeindex.com/hedgeindex/en/pressrelease.aspx?cy=CHF&DocID=813


the Kingate Global Fund, which was a hedge fund based on the British Virgin Island that acted as a feeder forthe Madoff funds and was completely wiped out all of its assets. Since the circumstances were so exceptionaland unlikely to happen again to such a large extent, it is probably warranted to simply drop that observationfrom the sample when estimating the model parameters.

The most important reason for excluding extreme observations from the sample is that they contribute tobias the coefficient estimates away from their true values. We can use the equity market neutral returns asan example to evaluate the effect of one extreme event on the parameters. Below, we first estimate a LRM ofthe fund returns on the market return and then the same regression, but dropping observation 179 from thesample.

fit0 <- lm(hfneutral ~ sp500 )


(Intercept) 0.352 0.179 1.971 0.05

sp500 0.199 0.040 4.971 0.00

fit1 <- lm(hfneutral[-179] ~ sp500[-179])


(Intercept) 0.560 0.063 8.818 0

sp500[-179] 0.128 0.014 8.971 0

These results indicate that by dropping the Nov 2008 return the estimate of β0 increases from 0.352 to 0.56while the market exposure of the fund declines from 0.199 to 0.128, thus indicating a smaller exposure tomarket returns. However, even after removing the outlier the exposure is statistically significant at 1%, thatsuggests that the aggregate index is not market neutral. In term of goodness-of-fit, the R2 increases from0.09405 to 0.2534943 due to dropping the large error experience in November 2008.


2.4 LRM with multiple independent variables

In practice, we might be interested to include more than just one variable to explain the dependent variableY . If we haveK independent variables that we are interested to include in the regression we can denote theobservations of each variable by Xk,t for k = 1, · · · ,K . The linear regression with multiple regressors isgiven by:

Yt = β0 + β1 ∗X1,t + · · ·+ βK ∗XK,t + ϵt

Also in this case we can use OLS to estimate the parameters βk (for k = 1, · · · ,K) which consists of the setof parameters that minimizes the sum of the squared residuals. Care should be given to the correlation amongthe independent variables to avoid cases of extremely high dependence. The extreme case of correlation equalto 1 among two independent variables is defined as perfect collinearity and the model cannot be estimatedbecause it is impossible to separately identify the coefficients of the two variables when the two variables areactually the same (apart from scaling). A similar situation arises when an independent variable has correlation1 with a linear combination of some other independent variables. The solution in this case is to drop one ofthe variables as discussed earlier. In practice, it is more likely to happen that the regressors have very highcorrelation although not equal to 1. In this case the model can be estimated but the coefficient estimates arehighly variable and thus become unreliable. For example, equity markets move together in response to newsthat affect the economies worldwide. So, there are significant co-movements among these markets and thushigh correlation which can become problematic in some situations. Care should be taken to test the robustnessof the results to the inclusion of variables with high correlation.

In R the LRMwith multiple regressors is estimated using the lm() command discussed before. As an example,we extend the earlier 1-factor model to a 3-factor model in which we use 3 independent variables or factors toexplain the variation over time of the returns of a mutual fund. The factors are called the Fama-French factorsafter the two economists that first proposed these factors to model risk in financial returns. In addition to themarket return, they construct two additional risk factors:

• Small minus Big (SMB) which is defined as the difference in the return of a portfolio of smallcapitalization stocks and a portfolio of large capitalization stocks. The aim of this factor is to capturethe premium from investing in small cap stocks.

• High minus Low (HML) is obtained as the difference between a portfolio of high Book-to-Market (B-M)ratio stocks and a portfolio of low B-M stocks. The high B-M ratio stocks are also called value stockswhile the low B-M ratio ones are referred to as growth stocks. The factor provides a proxy for thepremium from investing in value relative to growth stocks.

These factors are freely available at Ken French webpage. The file F-F_Research_Data_Factors.txt containsthe market return, SMB, HML, and the risk-free rate (RF). To upload the data we can use the followingcommand:

FF <- read.table('F-F_Research_Data_Factors.txt')

head(FF, 3)

http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html


Mkt.RF SMB HML RF

192607 2.95 -2.50 -2.67 0.22

192608 2.63 -1.20 4.50 0.25

192609 0.38 -1.33 -0.30 0.23

tail(FF, 3)

Mkt.RF SMB HML RF

201310 4.17 -1.53 1.39 0

201311 3.12 1.31 -0.38 0

201312 2.81 -0.44 -0.17 0

The dataset starts in July 1926 and ends in December 2013 and each column represents the factors discussedabove (plus the risk-free rate) with the returns expressed in percentage. The file is imported as a data.frameand it is useful to give a time series characterization as we did in the previous Chapter using the commandFF <- zooreg(FF, start=c(1926,7), end=c(2013,12), frequency=12) which defines the dataset as a zooobject.


Lets look at some descriptive statistics of these factors, in particular the mean and standard deviation. Since wehave a matrix with 4 columns (market returns, SMB, HML, and the risk-free rate) using the mean() functionwould calculate just one number that represents the average of all columns. Instead, we want to apply themean() function to each column and there is an easy way to do this using the function apply(). This functionapplies a function (e.g., mean or sd) to all the rows or columns of a matrix (second argument of the functionequal to 1 applies to rows and 2 to columns). In our case:

apply(FF, 2, mean)

Mkt.RF SMB HML RF

0.6487429 0.2342095 0.3938190 0.2871429

The results show that the average monthly market return from 1926 to 2013 has been 0.649% (approx. 7.785%yearly) in excess of the risk-free rate. The monthly average of SMB is 0.234% (approx 2.808% yearly) whichmeasures the monthly extra-return from investing on a portfolio of small caps relative to investing on largecapitalization stocks. The average of HML provides the extra-return from investing on value stocks relativeto growth stocks which annualized corresponds to about 4.728%.

apply(FF, 2, sd)

Mkt.RF SMB HML RF

5.4137611 3.2317907 3.5117457 0.2539519

The standard deviation of monthly returns is 5.414% which can be annualized by multiplying by√12. The

SMB and HML factors show significant volatility, since their monthly standard deviations are 3.232 and 3.512,respectively. Another quantity that we need to calculate to better understand the behavior of these risk factorsis their correlation. We can calculate the correlation matrix using the cor() function that was introducedearlier:

cor(FF)

Mkt.RF SMB HML RF

Mkt.RF 1.00000000 0.33429226 0.21574891 -0.06622179

SMB 0.33429226 1.00000000 0.11992280 -0.05650984

HML 0.21574891 0.11992280 1.00000000 0.01528333

RF -0.06622179 -0.05650984 0.01528333 1.00000000

where we find that both SMB and HML are weakly correlated to the market returns (0.334 and 0.216,respectively) and also among each other (0.12). In this sense, it seems that the factors capture relativelyuncorrelated sources of risk which is valuable from a diversification stand-point.

Let’s consider an application to mutual funds returns and their relationship to the FF factors. We downloadfrom Yahoo Finance data for the DFA Small Cap Value mutual fund (ticker: DFSVX). The fund invests in


company with small capitalization that are considered undervalued according to a valuation ratio (e.g., Bookto Market ratio) and holding the investment until these ratios are considered fair. In the long run, this strategyoutperforms the market, although it comes at the risk that they might significantly under perform overshorter evaluation periods. We estimate a three-factor model with the monthly excess return of DFSVX as thedependent variable and the regressors are represented by MKT, SMB, and HML. The three-factor model canbe estimated using the lm() command. However, when dealing with time series the package dyn provides awrapper for the lm() function which synchronizes the variables in case they span different time periods, butalso allows to use time series commands (like diff() and lag()) in the Equation definition. Notice that in theregression below we are regressing ex_dfsvx on the first three columns of FF (FF[,1:3]) with the dependentvariable starting in March 1993, but the independent starting in 1926. In this case the dyn$lm() will adjustautomatically the series such that they all span the same time period:

fit <- dyn$lm(ex_dfsvx ~ FF[,1:3]) # regress the excess fund returns on the 3 F-F factors

summary(fit)

Call:

lm(formula = dyn(ex_dfsvx ~ FF[, 1:3]))

Residuals:


-4.3860 -0.6704 -0.0041 0.7024 4.6372

Coefficients:


(Intercept) 0.001708 0.072859 0.023 0.981

FF[, 1:3]1 1.057903 0.016828 62.865 <2e-16 ***

FF[, 1:3]2 0.806633 0.023013 35.051 <2e-16 ***

FF[, 1:3]3 0.673636 0.024075 27.981 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


(801 observations deleted due to missingness)


F-statistic: 2056 on 3 and 245 DF, p-value: < 2.2e-16

As it is clear from the regression results, the fund has not only exposure to the market factor, but also toSMB and HML with positive coefficients. These results indicate that the fund overweighs value and smallstocks (relative to growth and large) and thus benefits, in the long-run, from the premium from investingin those stocks. We expected this result since the fund states clearly its strategy to buy undervalued smallcap stocks. However, keep in mind that we could have ignored the investment strategy followed by the fund,and the regression results would have indicated that the manager overweighs small cap value stocks in theportfolio. In this model, positive coefficients for SMB and HML indicate that the fund manager overweighs(relative to the market) small caps and/or value stocks, while a negative value suggest that the managergears the investments toward large caps and/or growth stocks. The intercept estimate is typically referred inthe finance literature as alpha and is interpreted as the risk-adjusted return, with the risk being measured


by β1 ∗ MKTt + β2SMBt + β3 ∗ HMLt.The example above shows how to use the LRM to analyze theinvesting style of a mutual fund or an investment strategy. In conjunction with nonlinear functional forms,this model can also be useful to investigate the style and the risk factors of hedge fund returns. In this case therole of nonlinearities is essential given the time-variation in the exposure (βs) of this type of funds, as well asthe extensive use of derivative which imply nonlinear exposure to the risk factors.

2.5 Omitted variable bias

An important assumption that is introduced when deriving the properties of the OLS estimator is that the setof regressors included in the model represents all those that are relevant to explain the dependent variable.However, in practice it is difficult to make sure that this assumption is satisfied. In some situations we mightnot observe a variable that we believe relevant to explain the dependent variable, while in others we mightnot know which variables are actually relevant. What is the effect of omitting relevant variables on the OLScoefficient estimates? The answer depends on the correlation between the omitted variable and the includedvariable(s). If the omitted and included variables are correlated, then the estimate of the slope coefficient ofthe included variable will be biased, that is, it will be significantly different from its true value. However, ifwe omit a relevant variable that is not correlated with any of included variables, then we do not expect anybias in the coefficient estimate. To illustrate this concept, we will discuss first an example based on real dataand then show the estimation bias in a simulation exercise.

Assume that we are given to analyze a mutual fund return with the aim to understand the risk factorsunderlying the performance of the fund. Since we are not told the investment strategy of the fund, we couldstart the empirical analysis by assuming that the relevant variables to include in the regression are the Fama-French factors. The 3 factor model is thus given byRfund

t = β0+β1∗MKTt+β2∗SMBt+β3∗HMLt+ϵtand its estimation by OLS provides the following results:


(Intercept) 0.320 0.298 1.073 0.285

FF[, 1:3]1 1.077 0.065 16.678 0.000

FF[, 1:3]2 0.395 0.089 4.459 0.000

FF[, 1:3]3 0.110 0.092 1.196 0.233

The fund has an exposure of 1.077 against the US equity market which is highly significant. In addition, thereseems to be significant (at 10%) exposure to SMB with a coefficient of 0.395 and a t-statistic of 4.459, but notto HML which has a t-statistic of 1.196. In addition, the R2 of the regression is equal to 0.65, which indicates areasonable fit for this type of regressions. Based on these results, we would conclude that the fund invests inUS equity with a focus on small cap stocks. However, it turns out that the fund is the Oppenheimer DevelopingMarkets (ticker: ODMAX) which is a fund that invests exclusively in stocks from emerging markets and doesnot hold any US stock. The results above appear as inconsistent with the declared investment strategy of thefund: how is it possible that the exposures to the MKT and SMB are large and significant in the regressionabove but the funds does not hold any US stock? It is possible that, despite these factors not being directlyrelevant to explain the performance of the fund, they indirectly proxy for the effect of an omitted risk factorthat is correlated withMKT and SMB. Given the investment objective of the fund, we could consider includingas an additional risk factor the MSCI Emerging Markets (EM) Index which seems a more appropriate choiceof benchmark for this fund. In terms of correlation between the EM returns and the FF factors, the resultsbelow indicate that there is a strong positive correlation with the US-equity market, as demonstrated by acorrelation of 0.699, and by much lower correlations with SMB (positive) and HML (negative).


EM Mkt.RF SMB HML RF

1.000 0.699 0.290 -0.184 -0.001

It seems thus reasonable to include the EM Index returns as an additional risk factor to explain the performanceof ODMAX. The Table below shows the estimation results of a regression of ODMAX excess monthly returnson 4 factors, the EM factor in addition to the FF factors.


(Intercept) 0.639 0.137 4.651 0.000

EM 0.803 0.030 27.201 0.000

FF[, 1:3]1 0.127 0.046 2.764 0.006

FF[, 1:3]2 0.218 0.041 5.294 0.000

FF[, 1:3]3 0.103 0.042 2.430 0.016

Not surprisingly, the estimated exposure to EM is 0.803 and highly significant, whilst the exposure to theFF factors decline significantly. In particular, adding EM to the regression has the effect of reducing thecoefficient of the MKT from 1.077 to 0.127. This large change (relative to its standard error) in the coefficientcan be attributed to the effect of omitting a relevant variable (i.e., EM) which produces bias in the coefficientestimates of the FF factors. The estimate from the first regression of 1.077 is biased because it does not represent(only) the effect of MKT on ODMAX, but also acts as a good proxy for the omitted source of risk of the EMIndex, given the large and positive correlation between MKT and EM. The effect of omitted variables andthe resulting bias in the coefficient estimates is not only an econometric issue, but it has important practicalimplications. If we use the LRM for performance attribution, that is, disentangling the systematic componentof the fund return (beta) from the risk-adjusted part (alpha), then omitting some relevant risk factors has theeffect of producing bias in the remaining coefficients and thus changes our conclusion about the contributionof each component to the performance of the fund.

To further illustrate the effect of omitted variables in producing biased coefficient estimates, we can performa simulation study of the problem. The steps of the simulation are as follows:

1. We will assume that the dependent variable Yt (for t = 1, · · · , T ) is generated by the following model:Yt = 0.5 ∗X1,t + 0.5 ∗X2,t + ϵt where X1,t and X2,t are simulated from the (multivariate) normaldistribution with mean 0 and variance 1 for both variables, with their correlation set equal to ρ. Theerror term ϵt is also normally distributed with mean 0 and standard deviation 0.5. In the context of thissimulation exercise, the model above forYt represents the truemodel for which we know the populationvalues of the parameters (i.e., β0 = 0 and β1 = β2 = 0.5).

2. We then estimate by OLS the following model: Yt = β0 + β1X1,t + ηt where we intentionally omitX2,t from the regression. Notice thatX2,t is both a relevant variable to explain Yt (since β2 = 0.5) andit is correlated withX1,t if we set ρ = 0

3. We repeat step 1-2 S times and store the estimate of β1

We can then analyze the properties of β1, the estimate of β1, by, for example, plotting a histogram of theS values obtained in the simulation. If omitting X2,t does not introduce bias in the estimate of β1, then wewould expect the histogram to be centered at the true value of the parameter 0.5. Instead, the histogram willbe shifted away from the true value of the parameter if the omission introduces estimation bias. The codebelow starts by setting the values of the parameters, such as the number of simulations, length of the timeseries, and the parameters of the distributions. Then the for loop iterates S times step 1 and 2 described above,while the bottom part of the program plots the histogram.


require(MASS) # this package is needed for function `mvrnorm()` to simulate from the

# multivariate normal distribution

S <- 1000 # set the number of simulations

T <- 300 # set the number of periods

mu <- c(0,0) # mean of variables X1 and X2

cor <- 0.7 # correlation coefficient between X1 & X2

Sigma <- matrix(c(1,cor,cor,1), 2, 2) # covariance matrix of X = [X1, X2]

beta <- c(0.5, 0.5) # slope coefficient of X = [X1, X2]

eps <- rnorm(T, 0, 0.5) # errors

betahat <- matrix(NA, S, 1) # vector to store the estimates of beta1

for (i in 1:S) # loop starts here

{ # simulate the indep. variable X = [X1, X2]

X <- mvrnorm(T, mu, Sigma) # `mvnorm` from package MASS simulate the

# multivariate normal distribution (dim: Nx2)

Y <- beta[1]*X[,1] + beta[2] * X[,2] + eps# simulate the dep. variable Y (dim: Nx1)

fit <- lm(Y ~ X[,1]) # fit the linear model of Y on X1 but not X2

betahat[i] <- coef(fit)[2] # store the estimated coefficient of X1

} # loop ends here

# set the limits of the x-axis

xmin <- min(min(betahat), 0.45)

xmax <- max(betahat)

# plot the histogram of betahat together with a vertical line at the true value of beta

hist(betahat, breaks=50, xlim=c(xmin, xmax), freq=FALSE,main="",xlab="",ylab="")

box()

abline(v=beta[1], col=2, lwd=4 )

text(beta[1], 1, "TRUE VALUE",pos=4, font=2)

In the simulation exercise we set the correlation between X1,t and X2,t equal to 0.7 (line cor = 0.7) and


it is clear from the histogram above that the distribution of the OLS estimate is shifted away from the truevalue of 0.5. This illustrates quite well the problem of omitted variable bias: we expect the estimates of β1to be close to the true value of 0.5, but we find that these estimates range from 0.75 to 0.95. The bias thatarises from omitting a relevant variable does not disappear by using longer samples, but on the fact that weomitted a relevant variable which is highly correlated with an included variable. If the omitted variable wasrelevant but uncorrelated with the included variable, then the histogram of the OLS estimates would look likethe following plot that is produced with the earlier code and by setting cor = 0.

3. Time Series ModelsIn this chapter we introduce time series models that are very often used in economics and finance. The LRMin the previous Chapter aims at relating the variation (over time or in the cross-section) of variable Y withanother variableX . In many cases we cannot make a statement about causality betweenX and Y , but ratherabout correlation which could go either way. The approach is different in time series models: we assumethat the independent variable X is represented by past values of the dependent variable Y . We denote byYt the value of a variable in time period t (e.g., minutes, hours, days, weeks, months, quarters, years). Whenwe write Yt−1 we refer to the value of the variable in the previous time period relative to today (i.e., t)and, more generally, Yt−k (for k ≥ 1) indicates the value of the variable k periods before t. k is referredas the lag and Yt−k as the lagged value. The aim of time series models can be stated as follows: withoutconsidering information about any other variable, is there information in past values of a variable to predictthe variable today? In addition to providing insights on the properties of the time series, these models areuseful in forecasting since they use the past to model the present and the present to forecast the future.

3.1 The Auto-Correlation Function (ACF)

A first exploratory tool that is used in time series analysis is the Auto-Correlation Function (ACF). The ACFrepresents the correlation between the variable Yt and the lagged value Yt−k. It is estimated from a datasample using the usual formula of the correlation, that is dividing the sample covariance between Yt andYt−k and the sample variance of Yt. The R function to calculate the ACF is acf() and in the example belowwe apply it to the monthly returns of the S&P 500 Index from February 1990 until December 2013:

require(tseries)

spm <- get.hist.quote("^GSPC", start="1990-01-01", end="2014-01-01", quote="AdjClose",

compression="m", retclass="zoo", quiet=TRUE)

spm.ret <- 100 * diff(log(spm))

# sp500 is defined as a zoo time series object but not as

# regularly spaced data (monthly); the acf() fuction

# applied to irregularly spaced objects gives an error, unless

# you add as argument 'na.action=na.exclude'

acf(spm.ret, lag.max=12, plot=TRUE)

Error in na.fail.default(as.ts(x)): missing values in object

acf(spm.ret, lag.max=12, plot=TRUE, na.action=na.exclude)

31

Time Series Models 32

# An alternative is to define sp500 to be a zooreg ('reg' for regular) object;

# in the rest of the chapter I will use the spm.ret defined as a zooreg

spm.ret <- zooreg(spm.ret, start=c(1990,1), frequency=12)

acf(spm.ret, lag.max=12, plot=TRUE, main="")

The option plot=TRUE implies that the function provides a graph where the horizontal axis is represented bylag k (starting at 0 and expressed as fraction of a year) and the vertical axes represents the autocorrelationwhich is a value between -1 and 1. The horizontal dashed lines represents the 95% confidence interval equalto ±1.96/

√T for the null hypothesis that the population auto-correlation at lag k is equal to 0. If the auto-

correlation at lag k is within the interval we conclude that we do not reject the null that the correlationcoefficient for that lag is equal to zero (at 5% level). If the option plot=FALSE then the function prints theestimates of the auto-correlation up to lag.max (equal to 12 in this example):

Autocorrelations of series 'spm.ret', by lag

0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500

1.000 0.071 -0.002 0.071 0.048 0.022 -0.073 0.037 0.058 -0.006

0.8333 0.9167 1.0000

0.011 0.028 0.078

This analysis considers monthly returns for the S&P 500 Index and shows that there is very small andstatistically insignificant serial correlation across months. We can use the ACF to investigate if there isevidence of serial correlation at the daily frequency. For the same sample period. The average daily percentagereturn in this sample period is 0.027% and the daily standard deviation is 1.156% (total number of days is 6049).The ACF plot up to 25 lags is reported below:


Since the sample size (T ) is quite large at the daily frequency these bands are tight around zero. However, wefind that some of these auto-correlations are statistically significant (lag 1, 2, 5, 7 and 10) at 5%, although froman economic standpoint they are very small to provide predictive power for the direction of future returns.

Although the S&P 500 returns do not show any significant correlation at the daily and monthly frequency,it is common to find for several asset classes that their absolute or square returns show significant and long-lasting serial correlations. This is a property of financial returns in general, and mostly at the daily frequencyand higher. The next plot shows the ACF for the absolute and square daily S&P 500 returns up to lag 100 days:


It is clear from these graphs that the absolute and square returns are significantly positively correlated and thatthe auto-correlation decays very slowly. This shows that large (small) absolute returns are likely to be followedby large (small) absolute returns, that is, the magnitude of returns is correlated rather than their direction.This is associated with the evidence that returns display volatility clusters that represent periods (that canlast several months) of high volatility followed by periods of low volatility. This suggests that volatility ispersistent (and thus predictable), while returns are unpredictable.

3.2 The Auto-Regressive (AR) Model

The first model that we consider is a linear regression that relates Yt to the previous value of the variable(hence, auto-regressive), that is, Yt = β0 + β1 ∗ Yt−1 + ϵt where β0 and β1 are coefficients to be estimatedand ϵt is an error termwithmean zero and varianceσ2

ϵ . Thismodel is typically referred to as AR(1) sincewe areonly using the first lag of the variable to explain the time variation in the current period. Similarly to the LRM,this model represents the E(Yt|Yt−1 = y) line in the scatter plot of Yt−1 and Yt. If the last period realizationof the variable was y then we expect the realization in the current period to beE(Yt|Yt−1 = y) = β0+β1∗y.The parameter β1 could be interpreted as in the LRM as the expected change in Yt for a unit change of Yt−1.Since the dependent and independent variables represent the same variable observed at different points intime, the slope β1 is also interpreted as a measure of persistence of the time series. By persistence we meanthe extent to which past values above/below themean are likely to persist above/below themean in the currentperiod. A large positive value of β1 represents a situation in which the series is persistently above/below themean for several periods, while a small positive value means that the series is likely to oscillate close to themean. The concept of persistence is also related to that ofmean-reversion which measures the speed at whicha time series reverts back to its long-run mean. The higher the persistence of a time series the longer it willtake for deviations from the long-run mean to be absorbed. Persistence is associated with positive values ofβ1, while negative values of the parameter implies that the series fluctuates around the mean. This is becausea value above the mean in the previous period is expected to be below the mean in the following period andvice versa for negative values. In economics and finance we typically experience positive coefficients dueto the persistent nature of economic shocks. To estimate the AR(1) model we can use OLS to estimate theparameters of the model and asymptotically (for large sample sizes) we expect the estimates to be consistent.


Let’s work with an example. We estimate an AR(1) model on the S&P 500 return at the monthly frequency.I will discuss two (of several) ways to estimate an AR model in R. One way is to use the lm() function inconjunction with the dyn package which gives lm() the capabilities to handle time series data and operations,such as the lag() operator. The estimation is implemented below:

require(dyn)

fit <- dyn$lm(spm.ret ~ lag(spm.ret, -1))

summary(fit)

Call:

lm(formula = dyn(spm.ret ~ lag(spm.ret, -1)))

Residuals:


-18.450 -2.391 0.538 2.730 10.338

Coefficients:


(Intercept) 0.5584 0.2573 2.17 0.031 *

lag(spm.ret, -1) 0.0707 0.0592 1.19 0.234

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1




F-statistic: 1.42 on 1 and 284 DF, p-value: 0.234

where β0 = 0.558 and β1 = 0.071 represents the estimate of the coefficient of the lagged monthly return.The estimate of β1 is quite close to zero which suggests that the monthly returns have little persistence andthat it is difficult to predict the return next month knowing the return for the current month. In addition,we can test the null hypothesis that β1 = 0 and the t-statistic is 1.193 with a p-value of 0.234 suggestingthat the coefficient is not significant at 10%. The lack of persistence in financial returns at high-frequencies(intra-daily or daily) but also at lower frequencies (weekly, monthly, quarterly) is well documented and it isone of the stylized facts of returns common across asset classes.

Another function that can be used to estimate AR models is the ar() from the tseries package. The inputs ofthe function are the series Yt, the maximum order/lag of the model, and the estimation method. An additionalargument of the function is aic that can be TRUE/FALSE. This option refers to the Akaike Information Criterion(AIC) that is a method to select the optimal number of lags in the AR model. It is calculated as a penalizedgoodness of fit measure, where the penalization is a function of the order of the AR model. In the case ofAR(1) the search is over lag 0 and 1 as you can see by comparing the two regression output below:


# use 'na.action=na.exclude' if spm.ret is not a zooreg object

ar(spm.ret, aic=FALSE, order.max=1, method="ols",

demean = FALSE, intercept = TRUE)

Call:

ar(x = spm.ret, aic = FALSE, order.max = 1, method = "ols", demean = FALSE, intercept = TRUE)

Coefficients:

1

0.071

Intercept: 0.558 (0.256)

Order selected 1 sigma^2 estimated as 18.4

ar(spm.ret, aic=TRUE, order.max=1, method="ols",

demean = FALSE, intercept = TRUE)

Call:

ar(x = spm.ret, aic = TRUE, order.max = 1, method = "ols", demean = FALSE, intercept = TRUE)

Intercept: 0.601 (0.254)


By setting aic=TRUE the results show that the selected order is 0 since the first lag is (statistically) irrelevant(i.e., the best model is Yt = β0 + ϵt). The option demean means that by default the ar() function subtractsthe mean from the series before estimating the regression. In this case, the ar() function estimates the modelYt = β0+β1∗Yt−1+ϵt the function estimates Yt−Y = β1∗(Yt−1−Y )+ϵt without an intercept. Estimatingthe two models provides the same estimate of β1 because E(Yt) = β0 + β1 ∗ E(Yt−1); if we denote theexpected value of Yt by µ, then using the previous equation we can express the intercept as β0 = (1− β1)µ.By replacing this value for β0 in the AR(1) model we obtain Yt = (1 − β1)µ + β1Yt−1 + ϵt that can berearranged as Yt − µ = β1(Yt−1 − µ) + ϵt. So, estimating the model in deviations from the mean or with anintercept leads to the same estimate of β1.

More generally, we do not have to restrict ourselves to explain Yt using only Yt−1 but can also use Yt−2,Yt−3 and further lags in the past. The reason why more lags might be needed to model a time series relatesto the stickiness in wages, prices, and expectations which might delay the effect of shocks on economic andfinancial variables. A generalization of the AR(1) model is the AR(p) that includes p lags of the variable:Yt = β0 + β1 ∗ Yt + β2 ∗ Yt−2 + · · · + βp ∗ Yt−p + ϵt which can be estimated with the dyn$lm() or ar()commands. Since we are using monthly data, we can set p (order.max in the ar() function) equal to 12 andthe estimation results are provided below:


Call:

ar(x = spm.ret, aic = FALSE, order.max = 12, method = "ols", demean = FALSE, intercept = TRUE)

Coefficients:

1 2 3 4 5 6 7 8 9

0.063 -0.017 0.107 0.037 0.034 -0.091 0.038 0.035 -0.003

10 11 12

0.011 0.015 0.065

Intercept: 0.436 (0.273)


The coefficient estimates are quite close to zero, but to evaluate their statistical significance we need to obtainthe standard errors of the estimates and calculate the t-statistics for the null hypothesis that the coefficientsare equal to zero:

coef <- fit$ar

se_coef <- fit$asy.se.coef$ar

t <- as.numeric(coef / se_coef)

t

[1] 1.0564 -0.2862 1.7760 0.6109 0.5652 -1.5191 0.6445 0.5844

[9] -0.0561 0.1854 0.2541 1.1048

where we can see that only the third lag is significant at 10%, although not at 5%. Since most of the lags are notsignificant, we prefer to reduce the number of lags and thus the amount of parameters that we are estimating.We can select the optimal number of lags in the AR(p) model by setting the option aic=TRUE:

Call:

ar(x = spm.ret, aic = TRUE, order.max = 12, method = "ols", demean = FALSE, intercept = TRUE)

Intercept: 0.601 (0.254)


which shows that the model that maximizes the AIC criterion is a model with no lags. This confirms theevidence from the ACF that past values of the series have little relevance in explaining the dynamics of thecurrent level of the variable Yt.

A similar analysis can be conducted on daily S&P 500 returns and their absolute or square transformations.Below we present the results for an AR(5) model estimated on the absolute return of the S&P 500. It mightbe preferable to use dyn$lm over ar() because it provides the typical regression output with a column for thecoefficient estimate, the standard errors, the t-statistics, and the p-values. The results are as follows:


fit <- dyn$lm(abs(spd.ret) ~ lag(abs(spd.ret), (-1):(-5)))

round(coef(summary(fit)), 3)


(Intercept) 0.241 0.017 14.21 0

lag(abs(spd.ret), (-1):(-5))1 0.057 0.013 4.50 0

lag(abs(spd.ret), (-1):(-5))2 0.180 0.013 14.41 0

lag(abs(spd.ret), (-1):(-5))3 0.117 0.013 9.26 0

lag(abs(spd.ret), (-1):(-5))4 0.143 0.013 11.45 0

lag(abs(spd.ret), (-1):(-5))5 0.193 0.013 15.32 0

The table shows that all lags are significant (even using a 1% significance level). The largest coefficient is 0.193which considered together with the other coefficients that are positive and statistically significant the overallpersistence of absolute daily returns means that the series is quite predictable. Using the ar() function toselect the order by AIC results in the choice of 12 lags.

3.3 Model selection for AR models

In practice, how do we choose p? One approach to the selection of the order of AR models consists ofmaximizing selection criteria that account for the goodness-of-fit of the model but penalize for the numberof lags included. The reason for the penalization is to avoid over-fitting: measures like R2 do not decreasewhen we increase the lag order so that it would lead to the selection of large lags. By penalizing the goodness-of-fit measure leads to add lags only when the increase in the measure is larger than the penalization. Themost often used criterion is the Akaike Information Criterion (AIC) given by log(RSS/T ) + 2 ∗ (1 + p)/T ,where RSS is the Residual Sum of Squares of the model, T is the sample size, and p is the lag order ofthe AR model. The AIC is calculated for models with different p and the selected order lag is the one thatminimizes the criterion. An alternative criterion is the Bayesian Information Criterion (BIC) which is givenby log(RSS/T ) + log(T ) ∗ (1 + p)/T . BIC applies a higher penalization relative to AIC (log(T ) instead of2) and leads to smaller lag orders.

3.4 Forecasting with AR models

One of the advantages of AR models is that they are very suitable to produce forecasts about the futurevalue of the series. In the simple case of an AR(1) model, at time t we can make a forecast about t + 1 asE(Yt+1|Yt) = β0 + β1 ∗ Yt, where Yt is known and the coefficient estimates are obtained by OLS. If theinterest is to forecast two-step ahead, then the forecast is equal to E(Yt+2|Yt) = β0 + β1 ∗ E(Yt+1|Yt) =β0 + β1 ∗ (β0 + β1 ∗ Yt) = β0 ∗ (1 + β1) + β2

1 ∗ Yt and similarly for longer horizon forecasts.

As an example, assume that we are interested in forecasting next period GDP. The earlier results for the ADFtest suggest that log(GDP ) is non-stationary so that we need to transform the series by differencing. Takingdifferences of the logarithm of GDP produces growth rates or percentage changes (if multiplied by 100). Theplot of the percentage growth rate of GDP together with the NBER recession periods are shown below.


Assume that we are at the beginning of 2014 and the GDP data for the forth quarter of 2013 have been releasedwhich allows to calculate the growth rate in quarter 4 of 2013. The aim is to forecast the percentage growth ofGDP in the first quarter of 2014 based on the information available at that time. The first step is to estimatean AR(p) model on data up to 2013Q4. We can first use the R command ar() to select the order p of the AR(p)model and the order selected is 1. We can then estimate an AR(1) model for the log-difference of real GDP:

fit <- dyn$lm(dlGDP ~ lag(dlGDP,-1))


(Intercept) 0.482 0.071 6.81 0

lag(dlGDP, -1) 0.386 0.057 6.72 0

which has a R2 of 0.152. The forecast for 2014Q1 based on the information available in 2013Q4 is obtainedas 0.482+0.386*0.648 which is equal to an expectation of GDP growth of 0.732%. The realized GDP growth inthat quarter happened to be -0.744 and the forecast error is thus -1.476% which is quite large relative to thestandard deviation of the time series of 0.939%. In addition, qualitatively the AR(1) model predicts persistenceof growth rates whilst in this case the realization was a temporary contraction of output in 2014Q1. R has ofcourse functions that produce forecasts automatically; for example:

fit <- ar(dlGDP, method="ols", order=1, demean=FALSE, intercept=TRUE)

predict(fit, n.ahead=4)


$pred

Qtr1 Qtr2 Qtr3 Qtr4

2014 0.732 0.764 0.776 0.781

$se

Qtr1 Qtr2 Qtr3 Qtr4

2014 0.855 0.917 0.926 0.927

where the variable dlGDP represents the percentage growth rates of GDP and the function predict() producesforecasts up to n.ahead periods. The $se output of the predict() function represents the standard error ofthe forecast and provides a measure of uncertainty around the forecast.

3.5 Seasonality

A seasonal pattern in a time series represents the regular occurrence of higher/lower realizations of thevariable in certain periods of the year. The seasonal pattern is related to the frequency at which the timeseries is observed. For daily data it could be by the 7 days of the week, for monthly data by the 12 monthsin a year, and for quarterly data by the 4 quarters in a year. For example, electricity consumption spikesduring the summer months while being lower in the rest of the year. Of course there are many other factorsthat determine the consumption of electricity which might grow over time, but seasonality captures thecharacteristic of systematically higher/lower values at certain times of the year. As an example, let’s assumethat we want to investigate if there is seasonality in the S&P 500 returns at the monthly frequency. To capturethe seasonal pattern we use dummy variables that take value 1 in a certain month and zero in all other months.For example, we define the dummy variable JANt to be equal to 1 if month t is January and 0 otherwise,FEBt takes value 1 every February and it is 0 otherwise, and similarly for the remaining months. We canthen include the dummy variables in a regression model, for example,

Yt = β0 + β1Xt + γ2FEBt + γ3MARt + γ4APRt + γ5MAYt+

γ6JUNt + γ7JULt + γ8AUGt + γ9SEPt + γ10OCTt + γ11NOVt + γ12DECt + ϵt

where Xt can be lagged values of Yt and/or some other relevant independent variable. Notice that in thisregression we excluded one dummy variable (in this case JANt) to avoid perfect collinearity among theregressors (dummy variable trap). The coefficients of the other dummy variables should then be interpretedas the expected value of Y in a certain month relative to the expectation in January, once we control for theindependent variableXt. There is an alternative way to specify this regression which consists of including all12 dummy variables and excluding the intercept: Yt = β1Xt+γ1JANt+γ2FEBt+γ3MARt+γ4APRt+γ5MAYt + γ6JUNt + γ7JULt + γ8AUGt + γ9SEPt + γ10OCTt + γ11NOVt + γ12DECt + ϵt The firstregression is typically preferred because a test of the significance of the coefficients of the 11 monthly dummyvariables provides evidence in favor or against seasonality. The same hypothesis could be evaluated in thesecond specification by testing that the 12 parameters of the dummy variables are equal to each other.

We can create the seasonal dummy variables by first defining a variable month using the cycle() functionwhich identifies the month for each observation in a time series object. For example, for the monthly S&P 500returns we have:


month <- cycle(spm.ret)

head(month, 12)

1990(1) 1990(2) 1990(3) 1990(4) 1990(5) 1990(6) 1990(7) 1990(8)

1 2 3 4 5 6 7 8

1990(9) 1990(10) 1990(11) 1990(12)

9 10 11 12

This shows that the month variable provides the month of each observation from 1 to 12. We can then use alogic statement to define the monthly dummy variables such as JAN = as.numeric(quarter == 1) whichreturns (printed are the first 12 observations of JAN)

[1] 1 0 0 0 0 0 0 0 0 0 0 0

To illustrate the use of seasonal dummy variables I consider a simple example in which I regress the monthlyreturn of the S&P 500 on the 12 monthly dummy variables (no Xt variable for now). As discussed before,to avoid the dummy variable trap, I opt for the exclusion of the intercept/constant from the regression andinclude the 12 dummy variables. The model is implemented as follows:

fit <- dyn$lm(spm.ret ~ -1 + JAN + FEB + MAR + APR + MAY +

JUN + JUL + AUG + SEP + OCT + NOV + DEC)

summary(fit)

Call:

lm(formula = dyn(spm.ret ~ -1 + JAN + FEB + MAR + APR + MAY +

JUN + JUL + AUG + SEP + OCT + NOV + DEC))

Residuals:


-19.890 -2.087 0.488 2.887 8.905

Coefficients:


JAN -0.171 0.874 -0.19 0.846

FEB 1.364 0.874 1.56 0.120

MAR 1.642 0.874 1.88 0.061 .

APR 0.905 0.874 1.03 0.302

MAY -0.625 0.874 -0.72 0.475

JUN 0.688 0.874 0.79 0.432

JUL -1.133 0.874 -1.30 0.196

AUG -0.480 0.874 -0.55 0.584

SEP 1.326 0.874 1.52 0.131

OCT 1.322 0.874 1.51 0.132

NOV 1.859 0.874 2.13 0.034 *

DEC 0.514 0.893 0.58 0.565

---


Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1




The results indicate that, in most months, the expected return is not significantly different from zero, exceptfor March and November. In both cases the coefficient is positive which indicates that in those months itis expected that returns are higher relative to the other months. To develop a more intuitive understandingof the role of the seasonal dummies, the graph below shows the fitted or predicted returns from the modelabove. In particular, the expected return of the S&P 500 in January is -0.171%, that is, E(Rt|JANt = 1) =-0.171, while in February the expected return is E(Rt|FEBt = 1) = 1.364% and so on. These coefficientsare plotted in the graph below and create a regular pattern that is expected to repeat every year:

Returns seems to be positive in the first part of the year and then go into negative territory during the summermonths only to return positive toward the end of the year. However, keep in mind that only March andNovember are significant at 10%. We can also add the lag of the S&P 500 return to the model above to see ifthe significance of the monthly dummy variables changes and to evaluate if the goodness of the regressionincreases:

fit <- dyn$lm(spm.ret ~ -1 + lag(spm.ret, -1) + JAN + FEB + MAR + APR +

MAY + JUN + JUL + AUG + SEP + OCT + NOV + DEC)

summary(fit)


Call:

lm(formula = dyn(spm.ret ~ -1 + lag(spm.ret, -1) + JAN + FEB +

MAR + APR + MAY + JUN + JUL + AUG + SEP + OCT + NOV + DEC))

Residuals:


-19.321 -2.330 0.367 2.821 9.343

Coefficients:


lag(spm.ret, -1) 0.0629 0.0604 1.04 0.299

JAN -0.2472 0.8952 -0.28 0.783

FEB 1.3751 0.8759 1.57 0.118

MAR 1.5563 0.8797 1.77 0.078 .

APR 0.8016 0.8814 0.91 0.364

MAY -0.6822 0.8775 -0.78 0.438

JUN 0.7273 0.8766 0.83 0.407

JUL -1.1764 0.8768 -1.34 0.181

AUG -0.4084 0.8785 -0.46 0.642

SEP 1.3562 0.8763 1.55 0.123

OCT 1.2387 0.8795 1.41 0.160

NOV 1.7756 0.8795 2.02 0.044 *

DEC 0.3987 0.9015 0.44 0.659

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1





The results show that in July and August the expected return of the S&P 500 is negative, by 1.176% and 0.408%,respectively, while in December it positive and equal to 0.399%.

Seasonality is a common characteristics of macroeconomic variables. Typically, we analyze these variableson a seasonally-adjusted basis, which means that the statistical agencies have already removed the seasonalcomponent from the variable. However, they also provide the variables before the adjustment and we considerthe Department Stores Retail Trade (FRED ticker RSDSELDN) at the monthly frequency. The time series graphfor this variable from January 1995 is shown below.


The seasonal pattern is quite clear in the data and it seems to happen toward the end of the year. We canconjecture that the spike in sales is probably associate with the holiday season in December, but we canquantitatively test this hypothesis by estimating a linear regression model in which we include monthlydummy variables as explanatory variables to account for this pattern. The regression model for the sales in $of department stores, denoted by Yt, is

Yt = β0 + β1Xt + γ2FEBt + γ3MARt + γ4APRt + γ5MAYt+ γ6JUNt + γ7JULt + γ8AUGt +γ9SEPt + γ10OCTt + γ11NOVt + γ12DECt + ϵt

where, in addition to the monthly dummy variables, we have the first order lag of the variable. Below areshown the regression results:

fit <- dyn$lm(Y ~ lag(Y, -1) + FEB + MAR + APR + MAY +

JUN + JUL + AUG + SEP + OCT + NOV + DEC)


(Intercept) -13544.583 879.715 -15.397 0

lag(Y, -1) 0.902 0.029 31.033 0

FEB 15560.687 542.250 28.697 0

MAR 16863.826 523.734 32.199 0

APR 14961.717 474.782 31.513 0

MAY 16019.665 478.189 33.501 0

JUN 14490.353 455.418 31.818 0

JUL 14573.837 472.162 30.866 0

AUG 16449.176 483.557 34.017 0

SEP 13444.096 450.119 29.868 0

OCT 16245.844 493.089 32.947 0

NOV 19172.537 463.364 41.377 0

DEC 24815.757 371.732 66.757 0


The results indicate the retail sales at department stores are highly persistent with an AR(1) coefficient of0.902. To interpret the estimates of the seasonal coefficients we need to notice that the dummy for the monthof January was left out so that all other seasonality dummy coefficients should be interpreted as the differenceis sales relative to the first month of the year. The results indicate that all coefficients are positive and thusretail sales are higher than January. From the low levels of January, sales seems to increase toward the summer,remain relative stable during the summermonths, and then increase significantly in November and December.Below is a graph of the variable and the model fit (red dashed line).

plot(Y, col="gray",lwd=5)

lines(fitted(fit),col=2,lty=2,lwd=2)

3.6 Trends in time series

A trend in a time series is defined as the tendency of an economic or financial time series to grow over time.Example are the real Gross Domestic Product (FRED ticker: GDPC1), the Consumer Price Index (FRED ticker:CPIAUCSL), and the S&P 500 index (YAHOO ticker: ˆGSPC).


The three series have in common the feature of growing over time with no tendency to revert back to themean. In fact, the trending behavior of the variables implies that the mean of the series is also increasingover time rather than being approximately constant as time progresses. For example, if we were to estimatethe mean of GDP, CPI, and the S&P 500 in 1985 it would not have been a good predictor of the future valueof the mean because the mean value of a variable with a trend keeps growing over time. This type of seriesare called non-stationary because the mean and the variance of the distribution are changing over time. Onthe other hand, series are defined stationary when their long-run distribution is constant as time progresses.For these variables, we can thus conclude that their time series graph clearly indicate that the variables arenon-stationary.

Very often in economics and finance we prefer to take the natural logarithm since it makes the exponentialgrowth of some variables approximately linear. The same variables discussed above are shown below innatural logarithm:

Taking the log of the variables is particularly relevant for the S&P 500 Index which shows a considerableexponential behavior, at least until the end of the 1990s. Also for GDP the time series plot seems to becomemore linear, while for CPI we observe different phases of a linear trend up to the end of the 1960s, the 1970s,


and again since the the 1980s.

3.7 Deterministic and Stochastic Trend

A simple approach to model this type of time series is to assume that they follow a deterministic (non-random)trend, and to start we can assume that this trend is linear. In this case we are assuming that the seriesYt evolvesaccording to the model Yt = β0+β1∗t+dt where the deviation dt is a zero mean stationary random variable(e.g., an AR(1) process). The independent variable denoted t is the time trend and takes value 1 for the firstobservation, value 2 for the second observation and value T for the last observation. This type of trend isreferred to as deterministic since it makes the prediction that every time period the series Yt is expected toincrease by β1. It is often referred as the trend-stationary model since the deterministic trend accounts forthe non-stationary behavior of the time series and the deviation from the trend is assumed to be stationary.Estimating by OLS this model in R requires first to create the trend variable and then we can use the lm() ordyn$lm() function as before:

trend <- zooreg((1:length(gdp)), start=c(1950,1), frequency=4)

fit <- dyn$lm(log(gdp) ~ trend)

coef(fit)

(Intercept) trend

7.761 0.008

where the estimate of β1 is 0.008 which indicates that real GDP is expected to grow 1% every quarter. Theapplication of the linear trend model to the three series shown above gives as a result the dashed trend lineshown in the graph below:

The deviation of the series from the fitted trend line are small for GDP, but for CPI and S&P 500 indicesthey are persistent and last for long periods of time (i.e., 10 years or even longer). Such persistent deviationsmight be due to the inadequacy of the linear trend model and the need to consider a nonlinear trend. This


can be accommodated by adding a quadratic and cubic term to the linear trend model which becomes: Yt =β0 + β1 ∗ t + β2 ∗ t2 + β3 ∗ t3 + dt where t2 and t3 represent the square and cube of the trend variable.The implementation in R requires the only additional step of creating the quadratic and cubic terms as shownbelow:

trend2 <- trend^2

trend3 <- trend^3

fitsq <- dyn$lm(log(gdp) ~ trend + trend2)

round(summary(fitsq)$coefficients, 4)


(Intercept) 7.6741 0.0063 1211.5 0

trend 0.0034 0.0000 89.1 0

trend2 0.0000 0.0000 -19.5 0

fitcb <- dyn$lm(log(gdp) ~ trend + trend2 + trend3)

round(summary(fitcb)$coefficients, 4)


(Intercept) 7.6944 0.0082 936.245 0.0000

trend 0.0031 0.0001 33.161 0.0000

trend2 0.0000 0.0000 0.362 0.7173

trend3 0.0000 0.0000 -3.749 0.0002

The linear (dashed line) and cubic (dash-dot line) deterministic trends are shown in the Figure below. Forthe case of GDP the differences between the two lines is not visually large, although the quadratic and/orcubic regression coefficients might be statistically significant at conventional levels. In addition, the AIC ofthe linear model is 281.471 while for quadratic and cubic trend model is -1004.701 and -1016.595. Hence, in thiscase we would select the cubic model which does slightly better relative to the quadratic, and significantlybetter relative to the linear trend model. However, for CPI the cubic trend seems to capture the slow increasein the log CPI index at the beginning of the sample, followed by a rapid increase and then again a slowergrowth of the index. The period of rapid growth of the CPI index happened in the 1970s when the surge inoil prices led to an increase of the rate of inflation in the US and globally. However, it could be argued thatthe deterministic trend model might not represent well the behavior of the CPI and S&P 500 index since theseries departs from the trend (even the cubic one) for long periods of time.


Another way to visualize the goodness of the trend-stationary model is to plot dt, the residuals or deviationfrom the cubic trend, and investigate their time series properties. Below we show the time series graph of thedeviations and their ACF function with lag up to 20 (quarters for GDP and months for CPI and S&P 500):


The ACF shows clearly that the deviation of the log GDP from the cubic trend is persistent but with rapidlydecaying values. To the contrary, for CPI and S&P 500 we observe that the serial correlation decays veryslowly which is typical of non-stationary time series. In other words, we find that the deviations exhibitnon-stationary behavior even after taking into account for a deterministic trend.

An alternative model that is often used in asset and option pricing is the random walk with driftmodel. Themodel takes the following form: Yt = µ+ Yt−1 + ϵt where µ is a constant and ϵt is an error term with meanzero and variance σ2. The random walk model assumes that the expected value of Yt is equal to the previousvalue of the series (Yt−1) plus a constant term µ (which can be positive or negative). In formula we can writethis as E(Yt|Yt−1) = µ+ Yt−1. The model can also be reformulated by substituting backwards the value ofYt−1 which, based on the model, is µ+ Yt−2 + ϵt−1 and we obtain Yt = 2 ∗ µ+ Yt−2 + ϵt + ϵt−1. Then wecan substitute Yt−2, Yt−3, and so on until we reach Y0 and the model can be written as


Yt = µ+ Yt−1 + ϵt

= µ+ µ+ Yt−2 + ϵt−1 + ϵt

= µ+ µ+ µ+ Yt−3 + ϵt−2 + ϵt−1 + ϵt

= · · ·

= Y0 + µ ∗ t+t∑

j=1

ϵt−j+1

This shows that a random walk with drift model can be expressed as the sum of a deterministic trend (t ∗ µ)and a term which is the sum of all past errors/shocks to the series. In case the drift term µ is set equal to zero,the model reduces to Yt = Yt−1 + ϵt = Y0 +

∑tj=1 ϵt−j+1 which is called the random walk model (without

drift since µ is equal to zero). Hence, another way to think of the random walk model with drift is as the sumof a deterministic linear trend and a random walk process.

The relationship between the trend-stationary and the random walk with drift models becomes clear if weassume that the deviation from the trend dt follow an AR(1) process, that is, dt = ϕdt−1 + ϵt, where ϕ isthe coefficient of the first lag and ϵt is a mean zero random variable. Similarly to above, we can do backwardsubstitution of the AR term in the trend-stationary model, that is,

Yt = β0 + β1 ∗ t + dt

+ ϕdt−1 + ϵt

+ ϕ2dt−2 + ϕ ∗ ϵt−1 + ϵt

+ ...

+ ϵt + ϕ ∗ ϵt−1 + ϕ2 ∗ ϵt−2 + · · ·+ ϕt−1ϵ1

+

t∑j=1

ϕj−1ϵt−j+1

Comparing this equation with the one obtained above for the random walk with drift model we find that theformer is a special case of the latter for ϕ = 1. We have thus related the two models in being only differentin terms of the persistence of the deviation from the trend. If the coefficient ϕ is less than 1 the deviationis stationary and thus the trend-stationary model can be used to de-trend the series and then conduct theanalysis on the deviation. However, when ϕ = 1 the deviation is non-stationary (i.e., random walk) andthe approach just described is not valid anymore and we will discuss later what to do in this case. A morepractical way to understand the issue of the (non-)stationarity of the deviation from the trend is to thinkin terms of the speed at which the series is likely to revert back to the trend-line. Series that oscillate oftenaround the trend are stationary while persistent deviations from the trend (slow reversion) are an indicationof non-stationarity. How do we know if a series (e.g., the deviation from the trend) is stationary or not? In thefollowing section we will discuss a test that evaluates this hypothesis and thus provides guidance as to whatmodeling approach to take.

The previous analysis of GDP, CPI, and the S&P 500 index shows that the deviations of GDP from its trendseem to revert to the mean faster relative to the other two series: this can be seen both in the time series plot


and also from the quickly decaying ACF. The estimate of the trend-stationary model shows that we expectGDP to grow around 0.8% per quarter (or 3.2% annualized), although GDP alternates periods above trend(expansions) and periods below trend (recessions). The alternation between expansions and recessions thuscaptures the mean-reverting nature of the GDP deviations from the long-run trend and its stationarity. We canevaluate the ability of the trend-stationary model to capture the features of the business cycle by comparingthe periods of positive and negative deviations with the peak and trough dates of the business cycle decidedby the NBER dating committee. In the graph below we plot the deviation from the cubic trend estimatedearlier together with the gray areas that indicate the period of recessions.

# dates from the NBER business cycle dating committee

xleft = c(1953.25, 1957.5, 1960.25, 1969.75, 1973.75, 1980,

1981.5, 1990.5, 2001, 2007.917) # beginning

xright = c(1954.25, 1958.25, 1961, 1970.75, 1975, 1980.5, 1982.75,

1991, 2001.75, 2009.417) # end

#fitgdp is the lm() object for the cubic trend model

plot(residuals(fitgdp), ylim=c(-0.10,0.10), xlab="", ylab="")

abline(h=0, col=2, lwd=2, lty=2)

rect(xleft, rep(-0.10,10), xright, rep(0.10,10), col="gray90", border=NA)

Overall, there is a tendency for the deviation to sharply decline during recessions (gray areas), and thenincrease during the recovery period and the expansion, which seem to have last longer since the mid-1980s.

Earlier we discussed that the distribution of a non-stationary variable changes over time. We can now derivethe mean and variance of Yt when it follows a trend-stationary model and when it follows a random walkwith drift. Under the trend-stationary model the dynamics follows Yt = β0+β1 ∗ t+dt and we can make thesimplifying assumption that dt = ϕ∗dt−1+ϵt with ϵt a mean zero and variance σ2

ϵ error term. Based on theseassumptions, we obtain that E(dt) = 0 and variance V ar(dt) = σ2

ϵ /(1− ϕ2), so that Et(Yt) = β0 + β1 ∗ tand V art(Yt) = V ar(dt) = σ2

ϵ /(1 − ϕ2).This demonstrates that the mean of a trend-stationary variable isa function of time and not constant, while the variance is constant. On the other hand for the random walkwith drift model we have thatEt(Yt) = Et

(Y0 + µ ∗ t+

∑tj=1 ϵt−j+1

)= Y0+µ∗ t and V art(Yt) = tσ2

ϵ .

http://www.nber.org/cycles/main.html


From these results we see that for the random walk with drift model both the mean and the variance are timevarying, while for the trend-stationary model only the mean varies with time.

The main difference between the trend-stationary and random walk with drift models thus consists if thenon-stationarity properties of the deviations from the deterministic trend. For the trend-stationary model thedeviations are considered stationary and they can be analysed using regression models estimated by OLS toinvestigate their dynamics. However, for the random walk with drift the deviations are non-stationary and itstime series cannot be considered in regressionmodels because of several statistical issues that will be discussedin the next Section, followed by a discussion of an approach to statistically test if a series is non-stationaryand non-stationary around a (deterministic) trend.

3.8 Why non-stationarity is a problem for OLS?

The estimation of time series models can be conducted by standard OLS techniques, with the only additionalrequirement that the series is stationary. If this is the case, the OLS estimator is consistent and t-statistics aredistributed according to the Student t distribution. However, these properties fail to hold when the series isnon-stationary and three problems arise:

1. the OLS estimate of the AR coefficients are biased in small samples2. the t test statistic is not normally distributed (even in large samples)3. the regression of a non-stationary variables on an independent non-stationary variable leads to spurious

results of dependence between the two series

We will discuss and illustrate these problems in the context of a simulation study using R. To show the firstfact we simulate B time series from a random walk with drift model Yt = µ + Yt−1 + ϵt for t = 1, · · · , Tand then estimate an AR(1) model Yt = β0+β1 ∗Yt−1+ ϵt and store the B OLS estimates of β1. We performthis simulation for a short and a long time series (T = 25 and T = 500, respectively) and compare the mean,median, and histogram of the β1 across the B simulations. Below is the code for T = 25:

require(dyn)

set.seed(1234)

T = 25 # length of the time series

B = 1000 # number of simulation

mu = 0.1 # value of the drift term

sigma = 1 # standard deviation of the error

beta = matrix(NA, B,1) # object to store the DF test stat

for (b in 1:B)

{

Y <- ts(cumsum(rnorm(T, mean=mu, sd=sigma))) # simulates the series

fit <- dyn$lm(Y ~ lag(Y,-1)) # OLS estimation of DF regression

beta[b] <- summary(fit)$coef[2,1] # stores the results

}

# plotting

hist(beta, breaks=50, freq=FALSE, main="")

abline(v=1, col=2,lwd=2)

box()


The histogram shows that the empirical distribution of β1 over 1000 simulations ranges between 0.158 and1.154 with a mean of 0.812 and median of 0.842. Both the mean and the median are significantly smallerrelative to the true value of 1. This demonstrates the bias in the OLS estimates in small samples when thevariable is non-stationary. However, this bias has a tendency to decline for larger sample sizes as shown inthe histogram below that is produced by the same code above but for T = 500:

In this case the min/max estimate are 0.948 and 1.005 with a mean and median of 0.997 and 0.998, respectively.For the larger sample of 500 periods, even though the series is non-stationary, the coefficient estimates of β1are close to the theoretical value of 1 and thus there is no bias.


The second fact that arises when estimating AR by OLSwhen variables are non-stationary is that the t statisticdoes not follow the normal distribution even when samples are large. This can be seen clearly in the histogrambelow for the test statistic for the null hypothesis that β1 = 1 and for T=500. The histogram below showsthat the distribution of t statistic for the null hypothesis that β1 = 1 which is skewed to the left relative tothe standard normal distribution.

The third problem with non-stationary variables occurs when the interest is the relationship between Xand Y and both variables are non-stationary. This could lead to spurious results of significant evidence of arelationship between the two series when indeed they are independent of each other. An intuitive explanationfor this result can be provided when considering, e.g., two independent random walk with drift: estimating aLRM regression model finds co-movement between the series due to the existence of a trend in both variablesthat makes the series move in the same or opposite direction. The simulation below shows more intuitivelythis results for two independent processes X and Y with the same drift parameter µ, but independent ofeach other (i.e., Y is not a function of X). The histogram of the t statistics for the significance of β1 inYt = β0 + β1Xt + ϵt is shown in the left plot, while the R2 of the regression is shown on the right. Thedistribution of the t test statistic has a significant positive mean and would lead to an extremely large numberof rejections of the hypothesis that β1 = 0, when indeed it is equal to zero. Also the distribution of the R2

shows that in the vast majority of these 1000 simulations we would find a moderate to large fit measurewhich would suggest a significant relationship between the two variables, although the truth is that there isno relationship.


require(dyn)

set.seed(1234)





tstat = matrix(NA, B,1) # object to store the DF test stat

R2 = beta

for (b in 1:B)

{

Y <- ts(cumsum(rnorm(T, mean=mu, sd=sigma))) # simulates the Y series

X <- ts(cumsum(rnorm(T, mean=mu, sd=sigma))) # simulates the X series

fit <- dyn$lm(Y ~ X) # OLS estimation of DF regression

tstat[b] <- summary(fit)$coef[2,3] # stores the results

R2[b] <- summary(fit)$r.square

}

# plotting

par(mfrow=c(1,2))

hist(tstat, breaks=50, freq=FALSE, main="")

box()

hist(R2, breaks=30, freq=FALSE, main="", xlim=c(0,1))

box()

3.9 Testing for non-stationarity

From an empirical point of view, our task is to establish if a series is stationary or non-stationary. In case thereis a trend/drift in the series (e.g., as for GDP, CPI and S&P 500 in the graphs above) then we need to understand


if the deviation from the trend is stationary or non-stationary. Typically, only by visually inspecting at theseries it is possible to determine if there is a drift or not in the series and thus to narrow down the questionof the stationarity of the deviation from the deterministic trend. If a drift in the series is not apparent, then itis likely that the series does not have a trend and thus the question is whether the series is an AR process ora random walk without drift.

To test for stationarity we follow the usual approach of calculating a test statistic with a known distributionunder the null hypothesis which allows to decide whether we are confident on the stationarity of the seriesor not. We do this by estimating the following processes by OLS:

Yt = ϕ ∗ Yt−1 + ϵt

Yt = µ+ ϕ ∗ Yt−1 + ϵt

In the case of the first Equation we are assuming a mean-zero AR(1) model which becomes a random walk(without drift) in case ϕ = 1, whilst for the following Equation we are estimating a AR(1) process whichbecomes a randomwalkwith drift in caseϕ = 1. The null hypothesis we are interested in testing isH0 : ϕ = 1(non-stationarity) against the alternative H1 : ϕ < 1 (stationarity). Rejection of the null hypothesis leads tothe conclusion that the series is stationary while failure to reject is interpreted as evidence that the series isnon-stationary. These models are typically reformulated by subtracting Yt−1 from both the left and right sideof the previous Equations which results in

∆Yt = γ ∗ Yt−1 + ϵt

∆Yt = µ+ γ ∗ Yt−1 + ϵt

where ∆Yt = Yt − Yt−1 and γ = ϕ− 1. Testing the hypothesis that ϕ = 1 is thus equivalent to test γ = 0.The test is referred to as the Dickey-Fuller (DF) test and is given byDF = γ

σγwhere γ and σγ are the OLS

estimate of γ and its standard error assuming homoskedasticity. Under the null hypothesis that the seriesis non-stationary, the DF statistic is not distributed according to the Student t since it requires running aregression that involves non-stationary variables which leads to the problems discussed earlier. Instead, itfollows a different distribution with critical values that are tabulated and are provided below.

The non-standard distribution of the DF test statistic can be investigated via a simulation study using R. Thecode below performs a simulation in which we generate a random walk time series (without drift), estimatethe DF regression equation, and store the t-statistic of Yt−1 which represents the DF test statistic. We repeatthese operations B times and then plot a histogram of the DF statistic together with the Student t distributionwith T-1 degree-of-freedom (T represents the length of the time series set in the code).


require(dyn)

set.seed(1234)





DF = matrix(NA, B,1) # object to store the DF test stat

for (b in 1:B)

{

Y <- ts(cumsum(rnorm(T, mean=mu, sd=sigma))) # simulates the series

fit <- dyn$lm(diff(Y) ~ lag(Y,-1)) # OLS estimation of DF regression

DF[b] <- summary(fit)$coef[2,3] # stores the results

}

# plotting

hist(DF, breaks=50, xlim=c(-6,6), freq=FALSE, main="")

curve(dt(x, df=(T-1)), add=TRUE, col=2)

box()

The graph shows clearly that the distribution of the DF statistic does not follow the t distribution: it hasa negative mean and median (instead of 0), it is skewed to the right (positive skewness) rather than beingsymmetric, and its empirical 5% quantile is -2.849 instead of the theoretical value of -1.66. Since we areperforming a one-sided test against the alternative hypothesis H1 : γ < 0, using the one-sided 5% criticalvalue from the t distribution would lead to reject the null hypothesis of non-stationarity too often relativeto the appropriate critical values derived by Dickey and Fuller. For the simulations exercise above, the(asymptotic) critical value for the null of a random walk with drift is -2.86 at 5% significance level. Thepercentage of simulations for which we reject the null based on this critical value and that from the tdistribution are


sum(DF < -2.86) / B

[1] 0.048

sum(DF < -1.67) / B

[1] 0.376

This shows that using the critical value from the t-distribution would lead to reject too often (37.6% of thetimes) relative to the expected level of 5%. Instead, using the correct critical value the null is rejected 4.8%which is quite close to the 5% significance level.

In practical implementation, it is typically advisable to include lags of ∆Yt to control for serial correlationin the changes of the variable. This is called the Augmented Dickey Fuller (ADF) and requires to estimatethe following regression model (for the case with a constant): ∆Yt = µ + γYt−1 +

∑pj=1 βj∆Yt−j + ϵt

which consists of adding lags of the change of Yt. The ADF test statistic is calculated as before by taking thet-statistic of the Yt−1, that is, ADF = γ/σγ . Another variation of the test also includes a trend variable inthe regression model to calculate the test statistic, that is,

∆Yt = µ+ γYt−1 + β0 ∗ t+p∑

j=1

βj ∗∆Yt−j + ϵt

The reason for including a deterministic trend in the ADF regression is that we want to be able to discriminatebetween a deterministic trend model and a random walk model with drift. If we do not reject the nullhypothesis then we conclude that the time series follows a random walk with drift whilst in case of rejectionwe conclude in favor of the deterministic trend model.

Once we calculate the DF or ADF test statistic we need to evaluate its statistical significance using theappropriate critical values. As discussed earlier, these statistics have a special distribution and critical valueshave been tabulated for the case with/without a constant and with/without a trend and for various samplesizes. Below you can find the critical values obtained for various sample sizes for a model with constant andwith or without trend:

Sample Size Without Trend With Trend

1% 5% 1% 5%T = 25 -3.75 -3.00 -4.38 -3.60T = 50 -3.58 -2.93 -4.15 -3.50T = 100 -3.51 -2.89 -4.04 -3.45T = 250 -3.46 -2.88 -3.99 -3.43T = 500 -3.44 -2.87 -3.98 -3.42T =∞ -3.43 -2.86 -3.96 -3.41

The non-stationarity test is implemented in the urca package using the function ur.df(). Below is an


application to the logarithm of the real GDP:

require(urca)

adf <- ur.df(log(gdp), type="trend", lags=4) # type: "none", "drift", "trend"

summary(adf)

###############################################

# Augmented Dickey-Fuller Test Unit Root Test #

###############################################

Test regression trend

Call:

lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals:


-0.03076 -0.00466 0.00052 0.00494 0.03379

Coefficients:


(Intercept) 9.06e-02 8.45e-02 1.07 0.28

z.lag.1 -1.08e-02 1.09e-02 -0.99 0.32

tt 7.23e-05 8.81e-05 0.82 0.41

z.diff.lag1 3.31e-01 6.42e-02 5.16 5.1e-07 ***

z.diff.lag2 9.85e-02 6.75e-02 1.46 0.15

z.diff.lag3 -4.73e-02 6.69e-02 -0.71 0.48

z.diff.lag4 -4.37e-02 6.34e-02 -0.69 0.49

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



F-statistic: 7.53 on 6 and 245 DF, p-value: 2.04e-07

Value of test-statistic is: -0.988 12.1 2.27

Critical values for test statistics:

1pct 5pct 10pct

tau3 -3.98 -3.42 -3.13

phi2 6.15 4.71 4.05

phi3 8.34 6.30 5.36

In this case the ADF test statistic is -0.99 which should be compared to the critical value at 5% -3.42 and we donot reject the null hypothesis that the series is non-stationary and follows a random walk with drift model.


3.10 What to do when a time series is non-stationary

If the ADF test leads to the conclusion that a series is non-stationary then we need to understand if the sourceof non-stationarity is a trend-stationary model or a random walk model (with or without a drift). In the firstcase, the non-stationarity is solved by de-trending the series, whichmeans that the series is regressed on a timetrend and the residuals of the regression are then modeled using time series models (e.g., AR(p)). However,in case we find that the series follows a random walk model then the solution to the non-stationarity is totake the difference of the series and thus analyze∆Yt = Yt −Yt−1. The reason for this is that differecing theseries removes the trend and this can be seen in the random walk with drift model Yt = µ+Yt−1+ ϵt whereby taking Yt−1 to the left-hand side results in∆Yt = µ+ ϵt. If we assume that ϵt is normally distributed withmean 0 and variance σ2

ϵ then ∆Yt will also be normally distribution with mean µ and variance σ2ϵ which is

obviously stationary.

4. Volatility ModelsA stylized fact across many asset classes is that the standard deviation of returns, often referred to as volatility,varies significantly over time. The graph below shows the time series of the daily returns of the S&P 500 fromthe beginning of 1990 until the end of 2013. The mean daily return for the S&P 500 returns is 0.028% and itsstandard deviation is 1.142%. This estimate of the standard deviation represents a long-run average betweenperiods of high and low volatility. In the Figure below we can see that returns are within the two standarddeviation confidence bands (dashed lines in the graph) for long periods of time. Occasionally, there are suddenbursts of high volatility that last for several months before volatility decreases again. In other words, volatilityis time-varying in the sense that it alternates between regimes of low and high volatility. This fact should beaccounted for by any model of financial returns since volatility is a proxy for risk, which is an importantinput to many financial decisions (e.g., from option pricing to risk management). Modeling volatility is thusa particularly relevant task in financial econometrics.

The model for high-frequency (daily and intra-daily) returns that we will work with in this Chapter hasthe following structure: Rt+1 = µt+1 + σt+1ϵt+1 where the asset return is decomposed in following threecomponents:

• Expected return: µt+1 represents the expected return of the asset. At the daily and intra-daily frequency,this component is typically assumed equal to zero. Alternatively, it could be assumed to follow an AR(p)processµt+1 = ϕ0+ϕ1∗Rt+· · ·+ϕp∗Rt−p+1 or being a function of other contemporaneous variables,that is, µt+1 = ϕ0 + ϕ1 ∗Xt.

• Volatility: σt+1 is the standard deviation of the shock conditional on information available at time t

62

Volatility Models 63

• Unexpected shock: ϵt+1 represents the shock that occurred in period t. A simple assumption is thatit is normally distribution with mean zero and variance one, but more sophisticated distributionalassumption can be introduced.

The aim of this Chapter is to discuss different models to estimate and forecast σt+1 using simple techniquessuch as Moving Average (MA) and Exponential Moving Average (EMA), followed by a discussion of amore sophisticated time series models such as the Auto-Regressive Conditional Heteroskedasticy (ARCH)model. ARCH models have been extended in several directions, but for the purpose of this Chapter we willconsider the two most important generalizations: GARCH (Generalized ARCH) and GJR-GARCH (Glosten,Jagganathan and Runkle ARCH)which includes an asymmetric effect of positive/negative shocks on volatility.

4.1 Moving Average (MA) and Exponential Moving Average (EMA)

A simple approach to estimate the conditional variance is to average the square returns over a recent windowof observations. Lets denote the window size as M (e.g., M = 25 days). The Moving Average (MA)estimate of the variance in day t, denoted σ2

t+1, is given by σ2t+1 = 1

M

(R2

t +R2t−1 + · · ·+R2

t−M+1

)=

1M

∑Mj=1R

2t−j+1 and the standard deviation is calculated as

√σ2t+1. The two extreme values of the window

are M = 1, which implies σ2t+1 = R2

t , and M = t, that leads to σ2t+1 = σ2, where σ2 represents the

unconditional variance estimated in the full sample. Small values of M imply that the volatility estimate isvery responsive to themost recent square returns, whilst for large values ofM the estimate responds very littleto the latest returns. Another way to look at the role of the window size on the smoothness of the volatilityis to interpret it as an average of the last M days each carrying a weight of 100/M% (i.e., for M = 25 theweight is 4%). When M increases, the weight given to each observation in the window becomes smaller sothat each daily square returns (even when extreme) has a smaller impact on changing the volatility estimate.

The MA approach can be implemented in R using the rollmean() function provided in package zoo whichrequires to specify the window size (M in the notation above). An example for the S&P 500 daily returns isprovided below:

require(zoo)

sigma25 <- rollmean(sp500daily^2, 25, align="right")

plot(abs(sp500daily), col="gray", xlab="", ylab="")

lines(sigma25^0.5, col=2, lwd=2)


The effect of increasing the window size fromM = 25 (continuous line) to 100 (dashed line) is shown in thegraph below: the longer window smooths out the fluctuation of the MA(25) since each observation is givena smaller weight. This implies that large (negative or positive) returns increase volatility less relative to MAcalculated on smaller windows.

require(zoo)

sigma100 <- rollmean(sp500daily^2, 100, align="right")

plot(sigma25^0.5, col=2, xlab="", ylab="")

lines(sigma100^0.5, lwd=2, col=4, lty=2)

One drawback of the MA approach is that a large daily return is able to increase significantly the estimatewhile it makes it decrease when the observations drops out of the window. This is because the MA approachdistributes the weight across the last M observations while returns older than M receive a zero weight. Anextension of the MA approach is to have a smoothly decreasing weight assigned to older returns instead of the


discrete jump of the weight from 1/M to 0 at theM + 1th observation. This approach is called ExponentialMoving Average (EMA) and it is calculated as follows:

σ2t+1 = λ

∞∑j=1

(1− λ)j−1R2t−j+1

where λ is a smoothing parameter between zero and one. After some algebra, the expression above can berewritten as follows:

σ2t+1 = (1− λ) ∗ σ2

t + λR2t

which shows that the conditional variance estimate in day t+1 is given by a weighted average of the previousday estimate and the square return in day t, with the weights equal to 1 − λ and λ, respectively. A typicalvalue of λ is 0.06 which means that day t has a 6% weight in determining that day’s volatility, day t− 1 hasweight (0.94 * 6) = 5.64%, day t− 2 has weight (0.942 * 6) = 5.3016%, day t− k has weight (0.94k * 6)% and soon. The method is called Exponential MA because the weights are decaying exponentially and became morepopular in finance since it was proposed by J.P. Morgan to model and predict volatility for Value-at-Risk (VaR)calculations.

The typical value forM is 25 (one trading month) and for λ it is 0.06. The graph below compares the volatilityestimates from the MA(25) and EMA(0.06) methods. In this example we use the package TTR which providesfunctions to calculate moving averages, both simple and of the exponential type:

require(TTR)

# SMA() function for Simple Moving Average; n = number of days

ma25 <- SMA(sp500daily^2, n=25)

# EMA() function for Exponential Moving Average; ratio = lambda

ema06 <- EMA(sp500daily^2, ratio=0.06)

plot(ma25^0.5, ylim=c(0, 6), ylab="", xlab="")

lines(ema06^0.5,col=2)


The picture shows daily volatility estimate for over 23 years and it is difficult (at this scale) to see largedifferences between the two methods. However, by plotting a sub-period of time, some differences becomeevident. Below we plot the three year period between beginning 2008 and end of 2009. The biggest differencebetween the two methods is that in periods of rapid change (from small/large to large/small returns) the EMAline captures the change in volatility regime more smoothly relative to the simple MA. This is in particularclear in the second part of 2008 and the beginning of 2009, with some marked difference between the twolines.

The parameters of MA and EMA are, usually, chosen a priori rather than being estimated from the data.There are possible ways to estimate these parameters although they are not popular in the financial literatureand typically do not provide great benefit for practical purposes. In the following Section we discuss a moregeneral volatility model which generalizes the EMA model and it is typically estimated from the data.


4.2 Auto-Regressive Conditional Heteroskedasticity (ARCH)models

The ARCH model was proposed in the early 1980s to model the time-varying variance of the inflation rate.Since then it has gained popularity in finance as a way to model the intermittent behavior of the volatility ofasset returns. The simplest version of the model assumes that the return is given byRt+1 = σt+1ϵt, where theconditional variance σ2

t is a function of the previous period square return, that is, σ2t+1 = ω+α∗R2

t where ωand α are parameters to be estimated. The AR part in the ARCH name relates to the fact that the variance ofreturns is a function of the lagged (square) returns. This specification is called the ARCH(1) model and can begeneralized to include p lags which results in the ARCH(p) model: σ2

t+1 = ω+α1∗R2t +· · ·+αp∗R2

t−p+1 Theempirical evidence indicates that financial returns typically require p to be large. A parsimonious alternative tothe ARCHmodel is the Generalized ARCH (GARCH) model which is characterized by the following Equationfor the conditional variance: σ2

t+1 = ω + α ∗ R2t + β ∗ σ2

t where today’s variance is affected both by lastperiod square return as well as by last period conditional variance. This specification is parsimonious, relativeto an ARCH model with p large, since all past square returns are included in σ2

t−1 using only 3 parametersinstead of p + 1. The specification above represents a GARCH(1,1) model since it includes one lag of thesquare return and the conditional variance. However, it can be easily generalized to the GARCH(p,q) case inwhich p lags of the square return and q lags of the conditional variance are included. The empirical evidencesuggests that the GARCH(1,1) is typically the best model for several asset classes that is only in rare instancesoutperformed by p and q different from 1. We can derive the mean of the conditional variance that, in thecase of the GARCH(1,1), is given by: σ2 = ω/(1− (α+ β)) where by σ2 we denote E(σ2

t+1) that representsthe unconditional variance as opposed to σ2

t that denotes the conditional one. The difference is that σ2t+1 is

a forecast of volatility based on the information available today, while σ2 represents an average over timeof these conditional forecasts. This result shows that a condition for the variance return to be finite is thatα+ β < 1. If this condition is not satisfied then the variance of the returns is infinite and volatility behaveslike a random walk model, instead of being mean-reverting. To understand this, we can recall the discussionof the AR(1) in which a coefficient (in absolute value) less than 1 guarantees that the time series is stationary,and thus mean-reverting. We can think similarly about the case of the variance, where the condition thatα + β is less than 1 provides the ground for volatility to be stationary and mean reverting. What does itmean that variance (or volatility) is mean reverting? It means that volatility might be persistent, but oscillatesbetween periods in which it is higher and lower than its unconditional level. This interpretation is consistentwith the empirical observation that volatility switches between periods in which it is high and others in whichit is low. On the other hand, non-stationary volatility implies that periods of high volatility (i.e., higher thanaverage) are expected to persist in the long run rather than reverting back to the unconditional variance. Thesame holds for period of low volatility which, in case of non-stationarity, is expected to last in the long run.This distinction is practically relevant, in particular when the interest is to forecast future volatility and atlong horizons.

The distinction between mean-reverting (i.e., stationary) and non-stationary volatility is an importantproperty that differentiates volatility models. In particular, the EMAmodel can be considered a special case ofthe GARCH conditional variance when its parameters are constrained to the following values: ω = 0, α = λand β = 1 − λ. This shows that EMA imposes the assumption that α + β = 1, and thus that volatility isnon-stationary. Empirically, this hypothesis can be tested by assuming the null hypothesis α+ β = 1 versusthe one sided hypothesis that α+ β < 1.

Another GARCH specification that has become popular was proposed by Glosten, Jagganathan and Runkle


(hence, GJR-GARCH) which assumes that the square return has a different effect on volatility depending onits sign. The conditional variance Equation of this model is

σ2t+1 = ω + α1 ∗R2

t + γ1R2t ∗ I(Rt ≤ 0) + β ∗ σ2

t

In this specification, when the return is positive its effect on the conditional variance is α1 and when it isnegative the effect is α1+γ1. Testing the hypothesis that γ1 = 0 thus provides a test of the asymmetric effectof shocks on volatility. Empirically, for many assets the estimation results show that α1 is estimated close tozero and insignificant, while γ1 is found positive and significant. The evidence thus indicates that negativeshocks lead to more uncertainty and an increase in the volatility of asset returns, while positive shocks do nothave a relevant effect.

Estimation of GARCH models

ARCH/GARCH models cannot be estimated using OLS because the model is nonlinear in parameters.Furthermore, the volatility is not directly observable which does not allow the OLS estimation of theconditional variance model as it would be the case if the dependent variable is observable. However, recentadvances in financial econometrics that will be discussed in a later Section have partly overcome this difficulty.The estimation of GARCHmodels is thus performed using an alternative (to OLS) estimation technique calledMaximum Likelihoood (ML). The ML estimation method represents a general estimation principle that canbe applied to a large set of models, not only to volatility models.

It might be useful to first discuss the ML approach to estimation in comparison to the familiar OLS one.The OLS approach is to choose the parameter values that minimize the sum of the square residuals since itmeasures the ‘unfitness’ of the model to explain the data for a certain set of parameter values. Instead, theapproach of ML is to choose the parameter values that maximize the likelihood or probability that the datawere generated by the model using a certain set of parameter values. For the case of a simple AR model withhomoskedastic errors it can be shown that the OLS and ML estimators are equivalent. A difference betweenOLS and ML is that the first provides analytical formula for the estimation of linear models, whilst the MLestimator is the result of numerical optimization.

We assume that volatility models are of the general form discussed earlier, i.e., Rt+1 = µt+1 + σt+1ϵt.Based on the distributional assumption introduced earlier, we can say that the standardized residuals (Rt+1−µt+1)/σt+1 should be normally distributed with mean 0 and variance 1. Bear in mind that both µt+1 andσ2t depend on parameters that are denoted by θµ and θσ . For example, for an AR(1)-GARCH(1,1) model the

parameter of the mean is θµ = (ϕ0, ϕ1) and the parameters of the conditional variance is θσ = (ω, α, β).Given the assumption of normality, the density or likelihood function of Rt+1 is

f(Rt+1|θµ, θσ) =1√

2πσ2t+1(θσ)

exp

[−1

2

(Rt+1 − µt+1(θµ)

σt+1(θσ)

)2]

which represents the normal density evaluated at Rt+1. We wrote the conditional mean and variance asµt+1(θµ) and σ2

t (θσ) to make explicit their dependence on parameters over which the likelihood functionwill be maximized. Since we have T returns, we are interested in the joint likelihood of the observed returnsand denoting by p the largest lag of Rt+1 used in the conditional mean and variance, we can define the(conditional) likelihood function L(θµ, θσ) (= f(Rp+1, · · · , Rt+1|θµ, θσ, R1, · · · , Rp)) as


L(θµ, θσ) =T∏

t=p+1

f(Rt+1|θµ, θσ) =T∏

t=p+1

1√2πσ2

t+1(θσ)exp

[−1

2

(Rt+1 − µt+1(θµ)

σt+1(θσ)

)2]

The ML estimates θµ and θσ are thus obtained by maximizing the likelihood function L(θµ, θσ). It isconvenient to log-transform the likelihood function to simplify the task of maximizing the function. Wedenote the log-likelihood by l(θµ, θσ) and it is given by

l(θµ, θσ) = lnL(θµ, θσ) = −1

2

T∑t=p+1

[ln(2π) + lnσ2

t+1(θσ) +

(Rt+1 − µt+1(θµ)

σt+1(θσ)

)2]

since the first term ln(2π) does not depend on any parameter, it can be dropped from the function. Theestimates θµ and θσ are then obtained by maximizing

l(θµ, θσ) = −1

2

T∑t=p+1

[lnσ2

t+1(θσ) +

(Rt+1 − µt+1(θµ)

σt+1(θσ)

)2]

The maximization of the likelihood or log-likelihood is performed numerically, which means that we usealgorithms to find the maximum of this function. The problem with this approach is that, in some situations,the likelihood function is not well-behaved since it is characterized by local maxima or it is ‘flat’, which meansthat it is relatively constant for a large set of parameter values. The numerical search has to be started at someinitial values and, in the difficult cases just mentioned, the choice of these values is extremely important toachieve the global maximum of the function. The choice of valid starting values for the parameters can beachieved by a small-scale grid search over the space of possible values of the parameters.

In the case of volatility models the likelihood function is usually well-behaved and achieves a maximumquite rapidly. In the following Section we discuss some R packages which implement GARCH estimation andforecasting.

Inference for GARCH models

…

GARCH in R

There are several packages that provide functions to estimate models from the GARCH family. One of theearliest is the garch() function in the tseries package. Being one of the earliest, it is quite limited in the typeof models it can estimate. Below is an example of the estimation of a GARCH(1,1) model to the daily S&P 500returns:


require(tseries)

fit <- garch(ts(sp500daily), order=c(1,1), trace=FALSE)

round(summary(fit)$coef, 3)


a0 0.011 0.001 8.188 0

a1 0.076 0.004 17.127 0

b1 0.915 0.005 180.214 0

Where a0, a1 and b1 represent the parameter ω, α, and β respectively. The output provides standard errors forthe parameter estimates as well as t-stats and p-values for the null hypothesis that the coefficients are equalto zero. The estimate of α is 0.076 and of β is 0.915, with their sum equal to 0.991, which is close enough to 1to conclude that volatility is non-stationary.

More flexible functions for GARCH estimation are provided by the package fGarch, that allows flexibility inmodeling the conditional mean µt+1 and the conditional variance σ2

t with time series models. The function toperform the estimation is called garchFit. The example below shows the application of the garchFit functionto the daily returns of the S&P 500 index. The first example estimates a GARCH(1,1) without the intercept inthe conditional mean (i.e., µt+1 = 0) and the results are thus comparable to the earlier ones for the garch

function; we then add in the model the intercept (µt+1 = µ), and finally we consider an AR(1)-GARCH(1,1)model. Notice that to specify the AR(1) for the conditional mean we use the function arma(p,q) which is amore general function than those used to estimate AR(p) models.

require(fGarch)

fit <- garchFit(~garch(1,1), data=sp500daily, include.mean=FALSE, trace=FALSE)

round(fit@fit$matcoef, 3)


omega 0.011 0.002 5.624 0

alpha1 0.076 0.007 11.375 0

beta1 0.915 0.007 125.069 0

While the point estimates are equal to those obtained earlier for the garch function, the standard errorsare different due to differences between analytical and numerical standard errors. In the example below weconsider an intercept in the conditional mean which leads to small changes in the coefficient estimates of thevolatility parameters:

fit <- garchFit(~garch(1,1), data=sp500daily, trace=FALSE)




mu 0.053 0.010 5.321 0

omega 0.012 0.002 5.762 0

alpha1 0.079 0.007 11.382 0

beta1 0.912 0.008 120.749 0

The results show that the mean µ is estimated equal to 0.053% and it is statistically significant even at 1%.Hence, for the daily S&P 500 returns the assumption of a zero expected return is rejected. It might also beinteresting to evaluate the need to introduce some dependence in the conditional mean, for example, byassuming a AR(1) model. The command arma(1,0) + garch(1,1) in the garchFit() function estimates anAR(1) model with GARCH(1,1) conditional variance:

fit <- garchFit(~ arma(1,0) + garch(1,1), data=sp500daily, trace=FALSE)



mu 0.054 0.010 5.371 0.000

ar1 -0.013 0.013 -0.959 0.337

omega 0.011 0.002 5.760 0.000

alpha1 0.078 0.007 11.375 0.000

beta1 0.912 0.008 121.046 0.000

The estimate of the AR(1) coefficient is -0.013 and it is not statistically significant at 10% level, which showsthe irrelevance of including dependence in the conditional mean of daily financial returns.

Based on the GARCH model estimation, we can then obtain the conditional variance and the conditionalstandard deviation. The conditional standard deviation is extracted by appending @sigma.t to the garchFitobject:

sigma <- [email protected]

# define sigma as a zoo object since it is numeric

sigma <- zoo(sigma, order.by=index(sp500daily))

plot(sigma, type="l", main="Standard deviation", xlab="",ylab="")


The graph shows the significant variation over time of the standard deviation that alternates between periodsof low and high volatility, in addition to sudden increases in volatility due to the occurrence of large returns.It is also interesting to compare the fitted standard deviation from the GARCH model with the ones obtainedfrom the MA and EMA methods. As the plot below shows, the three estimates track each other very closely.The correlation between the MA and GARCH conditional standard deviation is 0.98 and between EMA andGARCH is 0.988 and, to a certain extent, they can be considered very good substitutes for each other (inparticular at short horizons).

Another quantity that we need to analyze to evaluate the goodness of the GARCH model is the residuals.Appending @residuals to the garchFit estimation object we can extract the residuals of the GARCHmodel that represent an estimate of σt+1ϵt. The plot below shows that the residuals maintain most of thecharacteristics of the raw returns, in particular the clusters of volatility. This is because these residuals havebeen obtained from the last fitted model in which µt+1 = 0.054 + (-0.013) ∗Rt−1 and the residuals are thusgiven by Rt+1 − µt+1. The contribution of the intercept is to demean the return series while the small


coefficient on the lagged return leads to returns that are very close to the residuals.

res <- fit@residuals

res <- zoo(res, order.by=index(sp500daily))

par(mfrow=c(1,2))

plot(res, type="l", main="Unstandardized Residuals", xlab="", ylab="", cex.main=0.8)

plot(res/sigma, type="l", main="Standardized Residuals", xlab="", ylab="", cex.main=0.8)

It is more informative to consider the standardized returns, that is, (Rt+1− µt+1)/σt+1 which should satisfythe properties of an i.i.d. sequence, that is: 1) normally distributed (given our assumptions), and 2) no serialcorrelation in the residuals and square residuals. First, we plot the standardized residuals which do not seemto show any sign of heteroskedasticity (e.g., volatility clustering) since it has been taken into account by σt+1.However, we notice that there are three large negative standardized residuals, with a magnitude of aboutminus 6%, which are quite unlikely to happen assuming a normal distribution.

To evaluate the properties of the standardized residuals, we start from the evaluation of their distributionalproperties. One way to do this is to assess their normality, for example using a QQ plot.


It is clear from the QQ plot that the left-tail of the standardized residuals distribution seems to disagreewith normality: there are too many large negative returns (relative to what would be expected under thenormal) to be able to say that the residuals are normally distributed. We can further investigate this issue bycalculating the skewness, equal to -0.255, and that excess kurtosis, equal to 8.747. We can also consider theauto-correlation of residuals and square residuals to assess if there is neglected dependence in the conditionalmean and variance:

Overall, there is weak evidence of auto-correlation in the standardized residuals and in their squares, so thatthe GARCH(1,1) model seems to be well specified to model the daily returns of the S&P 500.

Another package that provides functions to estimate a wide range of GARCH models is the rugarch package.This packages requires first to specify the functional form of the conditional mean and variance using thefunction ugarchspec() and then proceed with the estimation using the function ugarchfit(). Below is anexample for an AR(1)-GARCH(1,1) model estimated on the daily S&P 500 returns:


require(rugarch)

spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),

mean.model=list(armaOrder=c(1,0)))

fitgarch = ugarchfit(spec = spec, data = sp500daily)

Estimate SE

mu 0.053 5.380

ar1 -0.013 -0.959

omega 0.011 5.459

alpha1 0.078 10.763

beta1 0.912 113.059

The estimation results are the same as those obtained for the fGarch package and more information about theestimation results can be obtained using command show(fitgarch) as shown below:

*---------------------------------*

* GARCH Model Fit *

*---------------------------------*

Conditional Variance Dynamics

-----------------------------------

GARCH Model : sGARCH(1,1)

Mean Model : ARFIMA(1,0,0)

Distribution : norm

Optimal Parameters

------------------------------------


mu 0.053207 0.009889 5.38046 0.00000

ar1 -0.012898 0.013449 -0.95902 0.33755

omega 0.011496 0.002106 5.45930 0.00000

alpha1 0.078496 0.007293 10.76270 0.00000

beta1 0.912106 0.008068 113.05857 0.00000

Robust Standard Errors:


mu 0.053207 0.009120 5.8344 0.000000

ar1 -0.012898 0.012249 -1.0530 0.292362

omega 0.011496 0.003234 3.5553 0.000378

alpha1 0.078496 0.012923 6.0740 0.000000

beta1 0.912106 0.013591 67.1116 0.000000

LogLikelihood : -8456.918

Information Criteria

------------------------------------

Akaike 2.6863

Bayes 2.6917


Shibata 2.6863

Hannan-Quinn 2.6882

Weighted Ljung-Box Test on Standardized Residuals

------------------------------------

statistic p-value

Lag[1] 0.03057 0.86119

Lag[2*(p+q)+(p+q)-1][2] 0.38666 0.98534

Lag[4*(p+q)+(p+q)-1][5] 5.08162 0.09963

d.o.f=1

H0 : No serial correlation

Weighted Ljung-Box Test on Standardized Squared Residuals

------------------------------------

statistic p-value

Lag[1] 3.89 0.048562

Lag[2*(p+q)+(p+q)-1][5] 11.83 0.003082

Lag[4*(p+q)+(p+q)-1][9] 13.13 0.009965

d.o.f=2

Weighted ARCH LM Tests

------------------------------------

Statistic Shape Scale P-Value

ARCH Lag[3] 0.0788 0.500 2.000 0.7789

ARCH Lag[5] 0.5472 1.440 1.667 0.8696

ARCH Lag[7] 0.6365 2.315 1.543 0.9644

Nyblom stability test

------------------------------------

Joint Statistic: 2.782

Individual Statistics:

mu 0.07596

ar1 1.59443

omega 0.20362

alpha1 0.23997

beta1 0.17185

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.28 1.47 1.88

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test

------------------------------------

t-value prob sig

Sign Bias 2.5578 1.056e-02 **

Negative Sign Bias 0.4089 6.826e-01

Positive Sign Bias 3.1726 1.518e-03 ***

Joint Effect 38.0232 2.795e-08 ***

Adjusted Pearson Goodness-of-Fit Test:


------------------------------------

group statistic p-value(g-1)

1 20 169.6 3.462e-26

2 30 186.2 7.020e-25

3 40 214.2 4.781e-26

4 50 234.6 4.826e-26

Elapsed time : 0.6000862

The estimation of the GJR-GARCH model is quite straightforward in this package and requires to specify theoption model='gjrGARCH' in ugarchspec(), in addition to selecting the orders for the conditional mean andvariance as shown below:

require(rugarch)

spec = ugarchspec(variance.model=list(model="gjrGARCH",garchOrder=c(1,1)),

mean.model=list(armaOrder=c(1,0)))

fitgjr = ugarchfit(spec = spec, data = sp500daily)

Estimate SE

mu 0.025 2.546

ar1 -0.003 -0.211

omega 0.015 6.803

alpha1 0.000 0.001

beta1 0.916 111.416

gamma1 0.137 11.003

The results for the S&P 500 confirm the earlier discussion that positive returns have a negligible effect inincreasing volatility (α1 =0) while negative returns have a very large and significant effect (γ1 = 0.137). Theplot below on the left compares the time series of the volatility estimates for GARCH and GJR and the plot onthe right-hand side shows the difference between the two estimates. This graph shows clearly that also thedifference has clusters of volatility which are due to large negative returns that increase significantly moreσt+1 for GJR than GARCH, as opposed to positive returns that have no effect on volatilities for GJR.


The selection of the best performing volatility model can be done using the AIC selection criterion, similarlyto the selection of the optimal order p for AR(p) models. The package rugarch provides the functioninfocriteria() that calculates AIC and several other selection criteria. These criteria are different in theamount of penalization that they involve for adding more parameters (AR(1)-GJR has one parameter morethan AR(1)-GARCH). For all criteria, the best model is the one that provides the smallest value. In this casethe GJR specification clearly outperforms the basic GARCH(1,1) model for all criteria.

ciao <- cbind(infocriteria(fitgarch), infocriteria(fitgjr))

colnames(ciao) <- c("GARCH","GJR")

ciao

GARCH GJR

Akaike 2.686323 2.654422

Bayes 2.691679 2.660849

Shibata 2.686322 2.654421

Hannan-Quinn 2.688179 2.656649

The function ugarchforecast() allows to compute the out-of-sample forecasts for a model n.ahead periods.The plot below shows the forecastsmade in 2014-12-31when the volatility estimateσt+1was 0.893 for GARCHand 0.782. Both models forecast an increase in volatility in the future since the volatility is mean-reverting inthese models (and at the moment the forecast was made volatility was below its long-run level σ).

garchforecast <- ugarchforecast(fitgarch, n.ahead=250)

gjrforecast <- ugarchforecast(fitgjr, n.ahead=250)


2014-12-30 19:00:00 2014-12-30 19:00:00

T+1 0.913 0.854

T+2 0.915 0.856

T+3 0.917 0.859

plot(sigma(garchforecast), type="l", xlab="",ylab="", ylim=c(ymin, ymax))

lines(sigma(gjrforecast), col=2)

Finally, we can compare the GARCH and GJR specifications based on the effect of a shock (ϵt) on theconditional variance (σ2

t ). The left plot refers to the GARCH(1,1) model and clearly show that positive andnegative shocks (of the same magnitude) increase the conditional variance by the same amount. However, thenews impact curve for the GJR model clearly shows the asymmetric effect of shocks, since there is no effectwhen ϵt−1 is positive but a large effect when the shock is negative.

newsgarch <- newsimpact(fitgarch)

newsgjr <- newsimpact(fitgjr)

plot(newsgarch$zx, newsgarch$zy, type="l", xlab=newsgarch$xexpr,

ylab=newsgarch$yexpr, lwd=2, main="GARCH", cex.main=0.8)

abline(v=0, lty=2)

abline(v=0, lty=2)

5. Measuring Financial RiskThe events of the early 1990s motivated the financial industry to develop methods to measure the potentiallosses in portfolio values due to adversemarket movements. Measuring risk is a necessary condition tomanagerisk: first we need to be able to quantify the amount of risk the institution is facing and then elaborate astrategy to control the effect of these potential losses. The aim of this chapter is to discuss the application ofsomemethods that have been proposed to measure of financial risk, in particular market risk which representsthe losses deriving from adverse movements of equity or currency markets, interest rates etc.

5.1 Value-at-Risk (VaR)

VaR was introduced in the mid-1990s by the investment bank J.P. Morgan as an approach to measure thepotential portfolio loss that an institution might face if an unlikely adverse event occurred at a certain timehorizon. Let’s define the profit/loss of a financial institution in day t + 1 by Rt+1 = 100 ∗ ln(Wt+1/Wt),whereWt+1 is the portfolio value in day t+ 1. Then Value-at-Risk (VaR) at 100(1− α)% is defined as

P (Rt+1 ≤ V aR1−αt+1 ) = α

where the typical values of α are 0.01 and 0.05. In practice, V aRt+1 is calculated every day and for an horizonof 10 days (2 trading weeks). If V aRt+1 is expressed in percentage return it can be easily transformed intodollars by multiplying the portfolio value in day t (denoted byWt) with the expected loss, that is, V aR1−α

t+1 =

Wt ∗ (exp(V aR1−αt+1 /100) − 1). From a statistical point view, 99% VaR represents the 1% quantile, that is,

the value such that there is only 1% probability that the random variable takes a value smaller or equal tothat value. The graphs below shows the Probability Density Function (PDF) and the Cumulative DistributionFunction (CDF). Risk calculation is concerned with the left tail of the return distribution since those are the rareevents that have a large and negative effect on the portfolio of financial institutions. This is the reason whythe profession has devoted a lot of energy to make sure that the left tail, rather than the complete distribution,is appropriately specified since a poor model for the left tail implies poor risk estimates (poor in a sense thatwill become clear in the backtesting section).

81

Measuring Financial Risk 82

VaR assuming normality

Calculating VaR requires making an assumption about the profit/loss distribution. The simplest and mostfamiliar assumption we can introduce is that Rt+1 ∼ N(µ, σ2), where µ represents the mean of thedistribution and σ2 its variance. Another way to make this assumption is to assume that the profit/loss followsthe modelRt+1 = µ+σϵt+1, with ϵt+1 ∼ N(0, 1). The quantiles for this model are given by µ+zασ, whereα is the probability level and zα the α quantile of the standard normal distribution. The typical levels of αfor VaR calculations are 1 and 5% and the corresponding zα are -1.64 and -2.33, respectively. Hence, 99% VaR

under normality is obtained as follows:

V aR0.99t+1 = µ− 2.33 ∗ σ

As an example, assume that an institution is holding a portfolio that replicates the S&P 500 Index and thatit wants to calculate the 99% VaR for this position. If we assume that returns are normally distributed, thento calculate V aR0.99

t+1 we need to estimate the expected daily return of the portfolio (i.e., µ) and its expectedvolatility (i.e., σ). Let’s assume that we believe that the distribution is approximately constant over time so thatwe can estimate the mean and standard deviation of the returns over a long period of time. In the illustrationbelow we use the S&P 500 time series that was used in the volatility chapter that consists of daily returnsfrom 1990 to 2014 (6300 observations).

mu = mean(sp500daily)

sigma = sd(sp500daily)

var = mu + qnorm(0.01) * sigma

[1] -2.629

The 99% VaR is -2.629% and represent the maximum loss of holding the S&P500 that is expected for thefollowing daywith 99% probability. If we had used a shorter estimationwindow of one year (252 observations),the V aR estimation would have been -1.572%. The difference between the two VaR estimates is quiteremarkable since we only changed the size of the estimation window. The standard deviation declines from


1.142% in the full sample to 0.707% in the shorter sample, whilst the mean changes from 0.028% to 0.073%. Asdiscussed in the volatility modeling Chapter, it is extremely important to account for time variation in thedistribution of financial returns if the interest is to estimate VaR at short horizons (e.g., a few days ahead).

Time-varying VaR

So far we assumed that the mean and standard deviation of the return distribution are constant and representthe long-run distribution of the variable. However, this might not be the best way to predict the distributionof the profit/loss at very short horizons (e.g., 1 to 10 days ahead) if the return volatility changes over time. Inparticular, in the volatility chapter we discussed the evidence that the volatility of financial returns changesover time and introduced models to account for this behavior. We can model the conditional distribution ofthe returns in day t+1 by assuming that both µt+1 and σt+1 are time-varying conditional on the informationavailable in day t. Another decision that we need to make to specify the model is the distribution of the errors.We can assume, as above, that the errors are normally distributed so that 99% VaR is calculated as:

V aR0.99t+1 = µt+1 − 2.33 ∗ σt+1

where the expected return µt+1 can be either constant (i.e. µt+1 = µ), or an AR(1) process (µt+1 = ϕ0 +ϕ1 ∗ Rt), and the conditional variance can be modeled as MA, EMA, or with a GARCH-type model. In theexample below, I assume that the conditional mean is constant (and equal to the sample mean) and model theconditional variance of the demeaned returns as an EMA with parameter λ = 0.06:

require(TTR)

mu <- mean(sp500daily)

sigmaEMA <- EMA((sp500daily-mu)^2, ratio=0.06)^0.5

var <- mu + qnorm(0.01) * sigmaEMA

The VaR time series inherits the time variation in volatility which alternates between calm periods of lowvolatility and risk, and other periods of increased uncertainty and thus the possibility of large losses. For this


example, we find that in 2.207% of the 6299 days the return was smaller relative to VaR. Since we calculatedVaR at 99% we expected to experience only 1% of days with violations.

Expected Shortfall (ES)

VaR represents the maximum (minimum) loss that is expected with, e.g., 99% (1%) probability. However, it canbe criticized on the ground that it does not convey the information on the potential loss that is expected ifindeed an extreme event (only likely 1% of less) occurs. For example, a VaR of -5.52% provides no informationon how large the portfolio loss is expected to be if the portfolio return will happen to be smaller than VaR. Thatis, how large do we expect the loss be in case VaR is violated? A risk measure that quantifies this potentialloss is Expected Shortfall (ES) which is defined as ES1−α

t+1 = E(Rt+1|Rt+1 ≤ V aR1−αt+1 ) that is, the

expected portfolio return conditional on being on a day in which the return is smaller than VaR. This riskmeasure focuses the attention on the left tail of the distribution and it is highly dependent on the shape of thedistribution in that area, while it neglects all other parts of the distribution.

An analytical formula for ES is available if we assume that returns are normally distributed. In particular, ifRt+1 = σt+1ϵt+1 with ϵt+1 ∼ N(0, 1), then VaR is calculated as V aR1−α

t+1 = zασt+1. The conditioning eventis that the return in the following day is smaller than VaR and the probability of this event happening is α,e.g. 0.01. We then need to calculate the expected value ofRt+1 over the interval from minus infinity to Rt+1

which corresponds to a truncated normal distribution with density function given by

f(Rt+1|Rt+1 ≤ V aR1−αt+1 ) = ϕ(Rt+1)/Φ(V aR1−α

t+1 /σt+1)

where ϕ(·) and Φ(·) represent the PDF and the CDF of the normal distribution (i.e., Φ(V aR1−αt+1 /σt+1) =

Φ(zα) = α). We can thus express ES as

ES1−αt+1 = −σt+1

ϕ(zα)

α

where zα is equal to -2.33 and -1.64 for α equal to 0.01 and 0.05, respectively. If we are calculating VaR at 99%so that α is equal to 0.01 then ES is equal to

ES0.99t+1 = −σt+1

ϕ(−2.33)

0.01= −2.64σt+1

where the value −2.64 can be obtained in R typing the command (2*pi)ˆ(-0.5) * exp(-(2.33ˆ2)/2) /

0.01 or using the function dnorm(-2.33) / 0.01. If α = 0.05 then the constant to calculate ES is -2.08 insteadof -1.64 for VaR. Hence, ES leads to more conservative risk estimates since the expected loss in a day in whichVaR is exceeded is always larger than VaR. We can plot the difference between VaR and ES as a function of1− α in the following graph:


sigma = 1

alpha = seq(0.001, 0.05, by=0.001)

ES = - dnorm(qnorm(alpha)) / alpha * sigma

VaR = qnorm(alpha) * sigma

√K rule

The Basel Accords require VaR to be calculated at a horizon of 10 days and for a risk level of 99%. In addition,the Accords allow financial institution to scale up the 1-day VaR to the 10 day horizon by multiplying itby

√10. Why

√10? Under what conditions is the VaR for the cumulative returns over 10 days, denoted by

V aRt+1:t+10, equal to√10 ∗ V aRt+1?

In day t we are interested in calculating the risk of holding the portfolio over a horizon of K days, that is,assuming that we can liquidate the portfolio only on the Kth day. Regulators require banks to use K = 10that corresponds to two trading weeks. To calculate risk of holding the portfolio in the nextK days we needto obtain the distribution of the sum of K daily returns or cumulative return, denoted by Rt+1:t+k, whichis given by

∑Kk=1Rt+k = Rt+1 + · · · + Rt+K , where Rt+k is the return in day t + k. If we assume that

these daily returns are independent and identically distributed (i.i.d.) with mean µ and variance σ2, then theexpected value of the cumulative return is

E

(K∑k=1

Rt+k

)=

K∑k=1

µ = Kµ

and its variance is

V ar

(K∑k=1

Rt+k

)=

K∑k=1

σ2 = Kσ2


so that the standard deviation of the cumulative return is equal to√Kσ. If we maintain the normality

assumption that we introduced earlier, than the 99% V aR of Rt+1:t+K is given by

V aR1−αt+1:t+K = Kµ− 2.33

√Kσ

In this formula, the mean and standard deviation are estimated on daily returns and they are then scaled upto horizonK .

This result relies on the assumption that returns are serially independent which allows us to set all covariancesbetween returns in different days equal to zero. The empirical evidence from the ACF of daily returns indicatesthat this assumption is likely to be accurate most of the time, although in times of market booms or bustsreturns could be, temporarily, positively correlated. What would be the effect of positive correlation in returnson VaR? The first-order covariance can be expressed in terms of correlation as ρσ2, with ρ the first orderserial correlation. To keep things simple, assume that we are interested in calculating VaR for the two-dayreturn, that is, K = 2. The variance of the cumulative return is V ar(Rt+1 + Rt+2) which is equal toV ar(Rt+1) + V ar(Rt+2) + 2Cov(Rt+1, Rt+2) = σ2 + σ2 + 2ρσ2. This can be re-written as 2σ2(1 + ρ)which shows that in the presence of positive correlation ρ the cumulative return becomes riskier, relative to theindependent case, since 2σ2(1 + ρ) > 2σ2. The Value-at-Risk for the two-day return is then V aR0.99

t+1:t+2 =

2µ − 2.33 ∗ σ ∗√2√1 + ρ, which is smaller relative to the VaR assuming independence that is given by

2µ − 2.33 ∗ σ ∗√2. Hence, neglecting positive correlation in returns leads to underestimating risk and the

potential portfolio loss deriving from an extreme (negative) market movement.

VaR assuming non-normality

One of the stylized facts of financial returns is that the empirical frequency of large positive/negative returnsis higher relative to the frequency we would expect if returns were normally distributed. This finding of fattails in the return distribution can be partly explained by time-varying volatility: the extreme returns are theoutcome of a regime of high volatility that occurs occasionally, although most of the time the returns are ina calm period with low volatility. The effect of mixing periods of high and low volatility is that the volatilityestimate based on the full sample overestimates uncertainty when volatility is low, and underestimates itwhen it is indeed high. This can be illustrated with a simple example: assume that in 15% of days volatilityis high and equal to 5% while in the remaining 85% of the days it is low and equal to 0.5%. Average volatilitybased on all observations is then 0.15 * 5 + 0.85 * 0.5 = 1.175%. This implies that in days of high volatility, thereturns appear large relative to a standard deviation of 1.175% although they are normal considering that theybelong to the high volatility regimes with a standard deviation of 5. On the other hand, the bulk of returnsbelong to the low volatility regime which looks like a concentration of small returns relative to the averagevolatility of 1.175%.

To illustrate the evidence of deviation from normality and the role of volatility modeling in financial returns,the graphs below show the QQ plot for the S&P 500 returns (left plot; scaled to have mean 0 and variance1) and the returns when they are standardized by the EMA volatility estimate calculated above (right; informula Rt/σt). The comparison shows that accounting for time-variation in volatility reduces significantlythe evidence of non-normality. However, there is still a slight deviation from the diagonal on the left tailand, to some extent, also on the right tail. This result is very important for risk calculation since we aremostly interested on the far left of the distribution: time-variation in volatility is the biggest factor drivingnon-normality in returns so that when we calculate risk measures we are confident that a combination of


the normal distribution and a dynamic method to forecast volatility should provide reasonably accurate VaRestimates. However, we still might want to account for the non-normality in the standardized returns ϵt andwe will consider two possible approaches in this Section.

One approach to relax the assumption of normally distributed errors is the Cornish-Fisher approximationwhich consists of performing a Taylor expansion of the normal distribution around its mean. This has theeffect of producing a distribution which is a function of skewness and kurtosis. We skip the mathematicaldetails of the derivation and focus on the VaR calculation when this approximation is adopted. If we assumethe mean µ is equal to zero, the 99% VaR for normally distributed returns is calculated as −2.33 ∗ σt+1 or,more generally, by zασt+1 for 100(1− α)% VaR, where Φ−1

1−α represents the 1− α-quantile of the standardnormal distribution. With the Cornish-Fisher (CF) approximation VaR is calculated in a similar manner, thatis, V aR1−α

t+1 = zCFα σt+1, where the quantile zCF

α is calculated as follows:

zCFα = zα +

SK

6

[z2α − 1

]+

EK

24

[z3α − 3zα

]+

SK2

36

[2z5α − 5zα

]where SK and EK represent the skewness and excess kurtosis, respectively, and for α = 0.01 we havethat zα = −2.33. If the data are normally distributed then SK = EK = 0 so that zCF

1−α = zα. However,in case the distribution is asymmetric and/or with fat tails the effect is that zCF

α ≤ zα. In practice, weestimate the skewness and the excess kurtosis from the sample and use those values to calculate the quantilefor VaR calculations. In the plot below we show the quantile zCF

0.01 (black line) and its relationship to z0.01 (reddashed line) as a function of the skewness and excess kurtosis parameters. The left plot shows the effect of theskewness parameter on the quantile, while holding the excess kurtosis equal to zero. Instead, the plot on theright shows the effect of increasing values of excess kurtosis, while the skewness parameter is kept constantand equal to zero. As expected, the zCF

0.01 is smaller than z0.01 = −2.33 and it is interesting to notice thatnegative skewness increases the (absolute) value of zCF

0.01 more than positive skewness of the same magnitude.This is due to the fact that negative skewness implies a higher probability of large negative returns comparedto large positive returns. The effect on VAR of accounting for asymmetry and fat tails in the data is thus toprovide more conservative risk measures.


alpha = 0.01

EK = 0; SK = seq(-1, 1, by=0.05)

z = qnorm(alpha)

zCF = z + (SK/6) * (z^2 - 1) + (EK/24) * (z^3 - 3 * z) +

(SK^2/36) * (2*z^5 - 5*z)

EK = seq(0, 10,by=0.1); SK = 0

zCF = z + (SK/6) * (z^2 - 1) + (EK/24) * (z^3 - 3 * z) +

(SK^2/36) * (2*z^5 - 5*z)

An alternative approach to allow for non-normality is to make a different distributional assumption for ϵt+1

that captures the fat-tailness in the data. A distribution that is often considered is the t distribution with asmall number of degrees of freedom. Since the t distribution assigns more probability to events in the tailof the distribution, it will provide more conservative risk estimates relative to the normal distribution. Thegraphs below show the t distribution for 4, 10, and ∞ degree-of-freedom, while the plot on the right zoomson the shape of the left tail of these distributions. It is clear that the smaller the d.o.f used the more likely areextreme events relative to the standard normal distribution (d.o.f. = ∞). So, also this approach delivers moreconservative risk measures relative to the normal distribution since it assigns higher probability to extremeevents.

plot(dnorm, xlim=c(-4,4), col="black", ylab="", xlab="",yaxt="n")

curve(dt(x,df=4), add=TRUE,col="orange")

curve(dt(x,df=10), add=TRUE,col="purple")


To be able to use the t distribution for risk calculation we need to set the value of the degree-of-freedomparameter, denoted by d. In the context of a GARCH volatility model this can be easily done by consideringd as an additional parameter to be estimated by maximizing the likelihood function based on the assumptionthat the ϵt+1 shocks follow a td distribution. A simple alternative approach to estimate the degree-of-freedomparameter d exploits the fact that the excess kurtosis of a td distribution is equal to EK = 6/(d − 4) (ford > 4) which is only a function of the parameter d. Thus, based on the sample excess kurtosis we can thenback out an estimate of d. The steps are as follows:

1. estimate by ML the GARCH model assuming that the errors are normally distributed2. calculate the standardized residuals as ϵt = Rt/σt3. estimate the excess kurtosis of the standardized residuals and obtain d as d = 6/EK + 4 (for d > 4)

For the standardized returns of the S&P 500 the sample excess kurtosis is equal to 2.22 so that the estimate ofd is equal to approximately 7 which indicates the need for a fat-tailed distribution. In practice, it would beadvisable to estimate the parameter d jointly with the remaining parameters of the volatility model, ratherthan separately. Still, this simple approach provides a starting point to evaluate the usefulness of fat taileddistributions in risk measurement.

5.2 Historical Simulation (HS)

So far we discussed methods to calculate VaR that rely on choosing a probability distribution and a volatilitymodel. A distributional assumption wemade is that returns are normal, which we later relaxed by introducingthe Cornish-Fisher approximation and the t distribution. In terms of the volatility model, we consideredseveral possible specifications such as EMA, GARCH, and GJR-GARCH, which we can compare usingselection criteria.

An alternative approach that avoids making any assumption and let the data speak for itself is representedby Historical Simulation (HS). HS consists of calculating the empirical quantile of returns at 100 ∗ α% levelbased on the most recent M days. This approach is called non-parametric because it estimates 99% VaR to bethe return value such that 1% of the sample is smaller, whatever the return distribution and volatility structuremight be. In addition to this flexibility, HS is very easy to compute since it does not require the estimation of


any parameter for the volatility model and the distribution. However, there are also a few difficulties in usingHS to calculate risk measures. One issue is the choice of the estimation window size M. Practitioners oftenuse values between M=250 and 1000, but, similarly to the choice of smoothing in MA and EMA, this is anad hoc value that has been validated by experience rather than being optimally selected based on a criterion.Another complication is that HS applied to daily returns provides a VaRmeasure at the one day horizon which,for regulatory purposes, should then be converted to a 10-day horizon. What is typically done is to apply the√10 rule discussed before, although it does not have much theoretical justification in the context of HS since

we are not actually making any assumption about the return distribution. An alternative would be to calculateVaR as the 1% quantile of the (non-overlapping) cumulative return instead of the daily return. However, thiswould imply a much smaller sample size, in particular for small M.

The implementation in R is quite straightforward and shown below. The function quantile() calculates the1 − α quantile for a return series which we can combine with the function rollapply() from package zooto apply it recursively to a rolling window of size M. For example, the command rollapply(sp500daily,

250, quantile, probs=alpha, align="right") calculates the function quantile (for probs=alpha andalpha=0.01) for the sp500daily time series with the first VaR forecast for day 251 until the end of the sample.The graph below shows a comparison of VaR calculated with the HSmethod for an estimation window of 250and 1000 days. The shorter estimation window makes the VaR more sensitive to market events as opposedto M=1000 that changes very slowly. A characteristic of HS that is particularly evident when M=1000 is that itmight be constant for long periods of time, even though volatility might have significantly decreased.

M1 = 250

M2 = 1000

alpha = 0.01

hs1 <- rollapply(sp500daily, M1, quantile, probs=alpha, align="right")

hs2 <- rollapply(sp500daily, M2, quantile, probs=alpha, align="right")


How good is the risk model? One metric that is used as an indicator of goodness of the risk model is thefrequency of returns that are smaller than VaR, that is,

∑Tt=1 I(Rt+1 ≤ V aRt+1)/T , where T represents the

total number of days and I(·) denotes the indicator function. A good risk model should have the frequencyof violations or hits close to the level 1− α for which VaR is calculated. If we are calculating VaR at 99% thenwe would expect that the model shows (approximately) 1% violations. For HS above, we have that for VaRcalculated on the 250 window there are 1.405% of days that represent violations. For the longer window ofM=1000 the violations of VaR represents 1.528% of the sample. For HS, we thus find that they are larger thanexpected and in the backtesting Section we will evaluate if this difference is statistically significant.

In the graph below, the VaR calculated from HS is compared to the EMA forecast. Although the VaR leveland the dynamics is quite similar, HS changes less rapidly and remains constant for long periods of time, inparticular in periods of rapid changes in market volatility.

require(TTR)

ema <- 2.33 * EMA(sp500daily^2, ratio=0.06)^0.5


5.3 Simulation Methods

We discussed several approaches that can be used to calculate VaR based on parametric assumptions or that arepurely non-parametric in nature. In most cases we demonstrated the technique by forecasting Value-at-Riskat the 1-day horizon. However, for regulatory purposes the risk measure should be calculated at a horizon of10 days. One approach that is widely used by practitioners and which is also allowed by regulators is to scaleup the 1-day VaR to 10-day by multiplying it by

√10. However, there are several limitations in doing this and

it becomes particularly innacurate when used for VaR models that assume a GARCH volatility structure.

An alternative is to use simulation methods that generate artificial future returns that are consistent withthe risk model. The model makes assumptions about the volatility model, and the distribution of the errorterm. Using simulations we are able to produce a large number of future possible paths for the returns thatare conditional on the current day. In addition, it becomes very easy to obtain the distribution of cumulativereturns by summing daily simulated returns along a path. We will consider two popular approaches thatdiffer in the way simulated shocks are generated: Monte Carlo Simulation (MC) consists of iterating thevolatility model based on shocks that are simulated from a certain distribution (normal, t or something else),and Filtered Historical Simulation (FHS) that assumes the shocks are equal to the standardized returns andtakes random sample of those values. The difference is that FHS does not make a parametric assumption forthe ϵt+1 (similarly to HS), while MC does rely on such assumption.

Monte Carlo Simulation

At the closing of January 22, 2007 and September 29, 2008 the risk manager needs to forecast the distribution ofreturns from 1 up to K days ahead, including the cumulative or multi-period returns over these horizons. Themodel for returns isRt+1 = σt+1ϵt+1, where σt+1 is the volatility forecast based on the information availableup to that day. The ϵt+1 are the random shocks that the risk manager assumes are normally distributed withmean 0 and variance 1. A Monte Carlo simulation of the future distribution of the portfolio returns requiresthe following steps:

1. Estimation of the volatility model: we assume that σt+1 follows a GARCH(1,1) model and estimatethe model by ML using all observations available up to and including that day. Below is the code to


estimate the model using the rugarch package. The volatility forecast for the following day is 0.483%which is significantly lower relative to the sample standard deviation from January 1990 to January2007 of 0.993%. To use some terminology introduced earlier, the conditional forecast (0.483%) is lowerrelative to the unconditional forecast (0.993%).

spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0), include.mean=FALSE))

# lowvol represents the January 22, 2007 date

fitgarch = ugarchfit(spec = spec, data = window(sp500daily, end=lowvol))

omega alpha1 beta1

0.005 0.054 0.942

2007-01-22 19:00:00

T+1 0.483

1. The next step consists of simulating a large number of return paths, say S, that are consistent with themodel assumption that Rt+1 = σt+1ϵt+1. Since we have already produced the forecast σt+1 = 0.483,to obtain simulated values ofRt+1 we only need to generate values for the error term ϵt+1. This can beeasily done in R using the command rnorm() which returns random values from the standard normaldistribution (and rt() does the same for the t distribution). Denote by ϵs,t+1 the s-th simulated valueof the shock (for s = 1, · · · , S), then the s-th simulated value of the return is produced by multiplyingσt+1 and ϵs,t+1, that is, Rs,t+1 = σt+1ϵs,t+1.

2. The next step is to use the simulate returns Rs,t+1 to predict volatility next period, denoted by σs,t+2.Since we have assumed a GARCH specification the volatility forecast is obtained by (σs,t+2)

2 = ω +α ∗ (Rs,t+1)

2+βσ2t+1 and the (simulated) returns at time t+2 are obtained asRs,t+2 = σs,t+2ϵs,t+2,

where ϵs,t+2 represent a new set of simulated values for the shocks in day t+ 2.3. Continue the iteration to calculateσs,t+k andRs,t+k for k = 1, · · · ,K . The cumulative or multi-period

return between t+ 1 and t+ k is then obtained as Rs,t+1:t+K =∑k

j=1Rs,t+j .

The code below shows how a MC simulation can be performed and plots the quantiles ofRs,t+k as a functionof k on the left plot and the quantiles of σs,t+k on the right plot (from 0.10 to 0.90 at intervals of 0.10 inaddition to 0.05 and 0.95). The volatility is branching out from σt+1 = 0.483 and its uncertainty increaseswith the forecast horizon. The green dashed line represents the average (over the simulations) volatility athorizon k that, as expected for GARCH models, tends toward the unconditional mean of volatility. SinceJanuary 22, 2007 was a period of low volatility, the graph shows that we should expect volatility to increaseat longer horizons. This is also clear from the left plot where the return distribution becomes wider and wideras the forecast horizon progresses.


set.seed(9874)

S = 10000 # number of MC simulations

K = 250 # forecast horizon

# create the matrices to store the simulated return and volatility

R = zoo(matrix(sigma*rnorm(S), K, S, byrow=TRUE), order.by=futdates)

Sigma = zoo(matrix(sigma, K, S), order.by=futdates)

# iteration to calculate R and Sigma based on the previous day

for (i in 2:K)

{

Sigma[i,] = (gcoef['omega'] + gcoef['alpha1'] * R[i-1,]^2 +

gcoef['beta1'] * Sigma[i-1,]^2)^0.5

R[i,] = rnorm(S) * Sigma[i,]

}

Howwould the forecasts of the return and volatility distribution look like if they were made in a period of highvolatility? To illustrate this scenario we consider September 29, 2008 as the forecast base. The GARCH(1,1)forecast of volatility for the following day is 3.165% and the distribution of simulate returns and volatilitiesare shown in the graph below. Since the day was in a period of high volatility, the assumption of stationarityof volatility made by the GARCHmodel implies that volatility will reduce in the future. This is clear from thereturn quantiles converging in the left plot, as well as from the declining average volatility in the right plot.


The cumulative or multi-period return can be easily obtained by the R command cumsum(Ret) where Ret

represents the K by S matrix of simulated one-day returns that the function cumulatively sums over thecolumns. The outcome is also a K by S matrix with each column representing a possible path of the cumulativereturn from 1 to K steps ahead. The plots below show the quantiles at each horizon k calculated across the Ssimulated cumulative returns. The quantile at probability 0.01 represents the 99% VaR that financial institutionsare required to report for regulatory purposes (at the 10 day horizon). The left plot represents the distributionof expected cumulative returns conditional on being on January 22, 2007 while the right plot is conditionalon September 29, 2008. The same scale of the y-axis for both plots highlights the striking difference in thedispersion of the distribution of future cumulative returns. Although we saw above that the volatility of dailyreturns is expected to increase after January 22, 2007 and decrease following September 29, 2008, the levelsof volatility in these two days are so different that when accumulated over a long period of time they lead tovery different distributions for the cumulative returns.


MC simulations make also easy to calculate ES. For each horizon k, ES can be calculated as the average of thosesimulated returns that are smaller than VaR. The code below shows these steps for the two dates consideredand the plots compare the two risk measures conditional on being on January 22, 2007 and on September 29,2008. Similarly to the earlier discussion, ES provides a larger (in absolute value) potential loss relative to VaR,a difference that increases with the horizon k.

VaR = zoo(apply(Rcum, 1, quantile, probs=0.01), order.by=futdates)

VaRmat = matrix(VaR, K, S, byrow=FALSE)

Rviol = Rcum * (Rcum < VaRmat)

ES = zoo(rowSums(Rviol) / rowSums(Rviol!=0), order.by=futdates)

Filtered Historical Simulation (FHS)

Filtered Historical Simulation (FHS) combines aspects of the parametric and the HS approaches to riskcalculation. This is done by assuming that the volatility of the portfolio return can be modeled with, e.g., EMAor a GARCH specification, while the non-parametric HS approach is used to model the standardized returnsϵt+1 = Rt+1/σt+1. It can be considered a simulation method since the only difference with the MC methoddiscussed above is that the random draws from the standardized returns are used to generate simulated returnsinstead of draws from a parametric distribution (e.g., normal or t). This method might be preferred when therisk manager is uncomfortable making assumptions about the shape of the distribution, either in terms ofthe thickness of the tails or the symmetry of the distribution. In R this method is implemented by replacingthe command rnorm(S) that generates a random normal sequence of length S with sample(std.resid, S,

replace=TRUE), where std.resid represents the standardized residuals. This command produces a sample oflength S of values randomly taken from the std.resid series. Hence, we are not generating new data, but weare simply taking different samples of the same standardized residuals so that to preserve their distributionalproperties.

The graphs below compare the MC and FHS approaches in terms of the expected volatility of the daily returns(i.e., average volatility at each horizon k across the S simulations) and 99% VaR (i.e., 0.01 quantile of the


simulated cumulative returns) for the two forecasting dates that we are considering. The expected futurevolatility by FHS converges at a slightly lower speed relative to MC during periods of low volatility, whilethe opposite is true when the forecasting point occurs during a period of high volatility. This is because FHSdoes not restrict the shape of the distribution on the left tail (as MC does given the assumption of normality)so that large negative standardized returns contribute to determine future returns and volatility. Of course,this result is specific to the S&P 500 daily returns that we are considering as an illustrative portfolio and itmight be different when analyzing other portfolio returns.

In terms of VaR calculated on cumulative returns we find that FHS predicts lower risk relative to MC at longhorizon, with the difference becoming larger and larger as the horizon K progresses. Hence, while at shorthorizons the VaR forecasts are quite similar, they become increasingly different at longer horizon.

# standardized residuals

std.resid = as.numeric(residuals(fitgarch, standardize=TRUE))

std.resid1 = as.numeric(residuals(fitgarch1, standardize=TRUE))

for (i in 2:K)

{

Sigma.fhs[i,] = (gcoef['omega'] + gcoef['alpha1'] * R.fhs[i-1,]^2 +

gcoef['beta1'] * Sigma.fhs[i-1,]^2)^0.5

R.fhs[i,] = sample(std.resid, S, replace=TRUE) * Sigma.fhs[i,]

Sigma.fhs1[i,] = (gcoef1['omega'] + gcoef1['alpha1'] * R.fhs1[i-1,]^2 +

gcoef1['beta1'] * Sigma.fhs1[i-1,]^2)^0.5

R.fhs1[i,] = sample(std.resid1, S, replace=TRUE) * Sigma.fhs1[i,]

}


5.4 VaR for portfolios

So far we discussed the simple case of a portfolio composed of only one asset and calculated the Value-at-Riskfor such position. However, financial institutions hold complex portfolios that include many assets and exposethem to several types of risks. How do we calculate V aR for such diversified portfolios?

Let’s first characterize the portfolio return as a weighted average of the individual asset returns, that is,Rp

t =∑J

j=1wj,tRj,t where Rpt represents the portfolio return in day t, and wj,t and Rj,t are the weight

and return of asset j in day t (and there is a total of J assets). To calculate VaR for this portfolio we need tomodel the distribution of Rp

t . If we assume that the underlying assets are normally distributed, then also theportfolio return is normally distributed and we only need to estimate its mean and standard deviation. In thecase of 2 assets (i.e., J = 2) the expected return of the portfolio return is given by

E(Rpt ) = µt,p = w1,tµ1 + w2,tµ2

which is the weighted average of the expected return of the individual assets. The portfolio variance is equalto


V ar(Rpt ) = σ2

t,p = w21,tσ

21 + w2

2,tσ22 + 2w1,tw2,tρ1,2σ1σ2

which is a function of the individual (weighted) variances and the correlation between the two assets, ρ1,2.The portfolio Value-at-Risk is then given by

V aR0.99p,t = µt,p − 2.33σt,p

= w1,tµ1 + w2,tµ2 − 2.33√

w21,tσ

21 + w2

2,tσ22 + 2w1,tw2,tρ1,2σ1σ2

In case it is reasonable to assume that µ1 = µ2 = 0, the V aR0.99t,p formula can be expressed as follows:

V aR0.99p,t = −2.33

√w21,tσ

21 + w2

2,tσ22 + 2w1,tw2,tρ1,2σ1σ2

= −√

2.332w21,tσ

21 + 2.332w2

2,tσ22 + 2 ∗ 2.332w1,tw2,tρ1,2σ1σ2

= −√

(V aR0.991,t )2 + (V aR0.99

2,t )2 + 2 ∗ ρ1,2V aR0.991,t V aR0.99

2,t

which shows that the portfolio VaR in day t can be expressed in terms of the individual VaRs of the assets andthe correlation between the two asset returns. Since the correlation coefficient ranges between ± 1, the twoextreme cases of correlation implies the following VaR:

• ρ1,2 = 1: V aR0.99p,t =

(V aR0.99

1,t + V aR0.992,t

)the two assets are perfectly correlated and the total

portfolio VaR is the sum of the individual VaRs• ρ1,2 = −1: V aR0.99

p,t = −|V aR0.991,t − V aR0.99

2,t | the two assets have perfect negative correlation thenthe total risk of the portfolio is given by the difference between the two VaRs since the risk in one assetis offset by the other asset, and vice versa.

More generally, both the mean and variance could be varying over time conditional on past information. Inthis case V aR0.99

p,t can be rewritten as

V aR0.99p,t = w1,tµ1,t + w2,tµ2,t − 2.33

√w21,tσ

21,t + w2

2,tσ22,t + 2w1,tw2,tρ12,tσ1,tσ2,t

In the Equation above we added a t subscript also to the correlation coefficient, that is, ρ12,t represents thecorrelation between the two asset conditional on the information available up to that day. There is evidencesupporting the fact that correlations between assets might be changing over time in response to market eventsor macroeconomic shocks (e.g., a recession). In the following Section we discuss some methods that can beused to model and predict correlations.


Modeling correlations

A simple approach to modeling correlations consists of using MA and EMA smoothing similarly to the case offorecasting volatility. However, in this case the object to be smoothed is not the square return, but the productof the returns of asset 1 and 2 (N.B.: we are implicitly assuming that the mean of both assets can be set equalto zero). Denote the return of asset 1 by R1,t, of asset 2 by R2,t, and by σ12,t the covariance between the twoassets in day t. We can estimate σ12,t by a MA ofM days:

σ12,t+1 =1

M

M∑m=1

R1,t−m+1R2,t−m+1

and the correlation is then obtained by dividing the covariance estimate with the standard deviation of thetwo assets, that is:

ρ12,t+1 = σ12,t+1/ (σ1,t+1 ∗ σ2,t+1)

In case the portfolio is composed of J assets then there are (J−1)/2 asset pairs for which we need to calculatecorrelations. An alternative approach is to use EMA smoothing which can be implemented using the recursiveformula discussed earlier:

σ12,t+1 = (1− λ)σ12,t + λR1,tR2,t

and the correlation is obtained by dividing the covariance by the forecasts of the standard deviations for thetwo assets.

To illustrate the implementation in R we assume that the firm is holding a portfolio that invests a fraction w1

in a gold ETF (ticker: GLD) and the remaining fraction 1 − w1 in the S&P 500 ETF (ticker: SPY). The closingprices are downloaded from Yahoo Finance starting in Jan 02, 2005 and ending on Apr 14, 2015 and the goal isto forecast portfolio VaR for the following day. We will assume that the expected daily returns for both assetsare equal to zero and forecast volatilities and the correlation between the assets using EMA with λ = 0.06.In the code below R represents a matrix with 2587 rows and two columns representing the GLD and SPY dailyreturns.

library(TTR)

prod <- R[,1] * R[,2]

# EMA for the product of returns

cov <- EMA(prod, ratio=0.06)

# Apply the EMA function to each column of R and make it a zoo object

sigma <- zoo(apply(R^2, 2, EMA, ratio=0.06), order.by=time(R))^0.5

corr <- cov / (sigma[,1] * sigma[,2])


The time series plot of the EMA correlation shows that the dependence between the gold and S&P 500 returnsoscillates significantly around the long-run correlation of 0.062. In certain periods gold and the S&P 500 havepositive correlation as high as 0.81 and in other periods as low as -0.65. During 2008 the correlation between thetwo assets became large and negative since investors fled the equity market toward gold that was perceivedas a safe haven during turbulent times. Based on these forecasts of volatilities and correlation, portfolio VaR

can be calculated in R as follows:

w1 = 0.5 # weight of asset 1

w2 = 1 - w1 # weight of asset 2

VaR = -2.33 * ( (w1*sigma[,1])^2 + (w2*sigma[,2])^2 +

2*w1*w2*corr*sigma[,1]*sigma[,2] )^0.5

The one-day portfolio Value-at-Risk fluctuates substantially between -0.79% and -8.05%, which occurredduring the 2008-09 financial crises. It is also interesting to compare the portfolio VaR with the risk measure if


the portfolio is fully invested in either asset. The time series graph below shows the VaR for three scenarios:a portfolio invested 50% in gold and equity, 100% gold, and 100% S&P 500. Portfolio VaR is between the VaRsfor the individual assets when correlation is positive. However, in those periods in which the two assets havenegative correlation (e.g., 2008) the portfolio VaR is higher than both individual VaRs since the two assets aremoving in opposite directions and the risk exposures partly offset each other.

VaRGLD = -2.33 * sigma[,1]

VaRSPY = -2.33 * sigma[,2]

Exposure mapping

…

5.5 Backtesting VaR

In the risk management literature backtesting refers to the evaluation/testing of the properties of a risk modelbased on past data. The approach consists of using the risk model to calculate VaR based on the information(e.g., past returns) that was available to the risk manager at that point in time. The VaR forecasts are thencompared with the actual realization of the portfolio return to evaluate if they satisfy the properties that weexpect should hold for a good risk model. One such characteristic is that the coverage of the risk model,defined as the percentage of returns smaller than VaR, should be close to 100*(1-α)%. In other words, if weare testing VaR with α equal to 0.01 we should expect, approximately, 1% of days with violations. If we findthat returns were smaller than V aR significantly more/less often than 1%, then we conclude that the modelhas inappropriate coverage. For the S&P 500 return and VaR calculated using the EMA method the coverageis equal to


V = na.omit(lag(sp500daily,1) <= var)

T1 = sum(V)

TT = length(V)

alphahat = mean(T1/TT)

[1] 0.022

In this case, we find that the 99% VaR is violated more often (0.022) then expected (0.01). We can test thehypothesis that α = 0.01 (or 1%) by comparing the likelihood that the sample has been generated by α asopposed to its estimate α, which in this example is equal to 0.022. One way to test this hypothesis is to definethe event of a violation of VaR, that is Rt+1 ≤ V aRt+1, as a binomial random variable with probability αthat the event occurs. Since we have a total of T days and introducing the assumption that violations areindependent of each other, then the joint probability of having T1 violations (out of T days) is

L(α, T1, T ) = αT1(1− α)T0

where T0 = T − T1. The hypothesis α = 0.01 can be tested by comparing this likelihood above at theestimated α and at the theoretical value of 0.01 (more generally, if VaR is calculated at 95% then α is 0.05).This type of tests are called likelihood ratio tests and can be interpreted as the distance between the theoreticalvalue (i.e., using α = 0.01) of the likelihood of obtaining T1 violation in T days and the likelihood based onthe sample estimate α. The statistic and distribution of the test for Unconditional Coverage (UC) are

UC = −2 ln(L(0.01, T1, T )

L(α, T1, T )

)∼ χ2

1

where α = T1/T and χ21 denotes the chi-square distribution with 1 degree-of-freedom. The critical values

at 1, 5, and 10% are 6.63, 3.84, and 2.71, respectively, and the null hypothesis α = 0.01 is rejected if LRUC islarger than the critical value. In practice, the test statistic can be calculated as follows:

−2

[T1 ln

(0.01

α

)+ T0 ln

(0.99

1− α

)]In the example discussed above we have α = 0.022, T1 = 138, and T is 6268. The test statistic is thus

UC = -2 * ( T1 * (log(0.01/alphahat))

+ (TT - T1) * log(0.99/(1-alphahat)))

[1] 68.1

Since 68.1 is larger than 3.84 we reject the null hypothesis at 5% significance level that α = 0.01 and concludethat the risk model provides inappropriate coverage.

While testing for coverage, we introduced the assumption that violations are independent of each other. Ifthis assumption fails to hold, then a violation in day t has the effect of increasing/decreasing the probability


of experiencing a violation in day t + 1, relative to its unconditional level. A situation in which this mighthappen is during financial crises when markets enter a downward spiral which is likely to lead to severalconsecutive days of violations and thus to the possibility of underestimating risk. The empirical evaluation ofthis assumption requires the calculation of two probabilities: 1) the probability of having a violation in dayt given that a violation occurred in day t − 1, and 2) the probability of having a violation in day t giventhat no violation occurred the previous day. We denote the estimates of these conditional probabilities as α1,1

and α0,1, respectively. They can be estimated from the data by calculating T1,1 and T0,1 that represent thenumber of days in which a violation was preceded by a violation and a no violation, respectively. In R we candetermine these quantities as follows:

T11 = sum((lag(V,-1)==1) & (V==1))

T01 = sum((lag(V,-1)==0) & (V==1))

where we obtain that T0,1 = 131 and T1,1 = 7, with their sum equal to T1. Similarly, we can calculate T1,0

and T0,0 and the estimates are 131 and 5998, respectively, that sum to T0. Since we look at violations intwo consecutive days we lose one observation and our total sample size is now T − 1 = TRUE. We can thencalculate the conditional probabilities of a violation in a day given that the previous day there was no violationas α0,1 = T0,1/(T0,1 + T1,1) while the probability of a violation in two consecutive days, that is, α1,1 = 1−α0,1 = T1,1/(T0,1+T1,1). Similarly, we can calculate α1,0 and α0,0. For the daily S&P 500 introduced earlier,the estimates are α1,1 = 0.051 and α0,1 = 0.949, and α1,0 = 0.021 and α0,0 = 0.979. While we have an overallestimated probability of a violation of 2.2%, this probability is equal to 5.072% after a day in which the riskmodel was violated and 2.137% following a day without a violation. Since these probabilities are significantlydifferent from each other, we should conclude that the violations are not independent over time. To make thisconclusion in statistical terms, we can statistically test the hypothesis of independence of the violations bystating the null hypothesis as α0,1 = α1,1 = α which can be tested using the same likelihood ratio approachdiscussed earlier. In this case, the statistic is calculated as the ratio of the likelihood under independence,L(α, T1, T ), relative to the likelihood under dependence, which we denote by L(α0,1, α1,1, T0,1, T1,1, T )which is given by

L(α0,1, α1,1, T0,1, T1,1, T ) = αT1,0

1,0 (1− α1,0)T1,1α

T0,1

0,1 (1− α0,1)T0,0

the test statistic and distribution for the hypothesis of Independence (IND) in this case is

IND = −2 ln(

L(α, T0, T )

L(α0,1, α1,1, T0,1, T1,1, T )

)∼ χ2

1

and the critical values are the same as for the UC test. The numerator of the IND test is the likelihood in thedenominator of the UC test and it is based on the empirical estimate of α rather than the theoretical value α.The value of the IND test statistic is 4.04 which is greater relative to the critical value at 5% so that we rejectthe null hypothesis that the violations occur independently.

Finally, we might be interested in testing the hypothesis α0,1 = α1,1 = 0.01 which tests jointly theindependence of the violations as well as the coverage which should be equal to 1%. This test is referredto as Conditional Coverage and the test statistic is given by the sum of the previous two test statistics, thatis, CC = IND + UC and it is distributed as a χ2

2. The critical values at 1, 5, and 10% in this case are 9.21,5.99, and 4.61, respectively.

lecture notes intro to financial econ

Documents

r data frame

financial data

time series models

data filerepresents

time series object

r environment

monthly time series

time series connotation