value-at-risk model combination using artiﬂcial …...value-at-risk model combination using...

Value-at-Risk Model Combination Using Artificial Neural

Networks ∗

Yan Liu

Emory University

August 2005

Abstract

Value at Risk (VaR) has become the industry standard to measure the market risk. However,

the selection of the VaR models is controversial. Simulation Results indicate Historical Simulation

has significant positive bias, while GARCH (1,1) has has significant negative bias. Also HS adapts

structural change slowly but stable, while GARCH adapts structural break rapidly but less stable.

Thus the model selection is often unstable and cause high variability in the final estimation. This

paper proposes VaR forecast combinations using Artificial Neural Networks (ANNs) instead of

model selection. Based on Mean Loss Comparison, Violation Ratio and Christofferson’s condi-

tional coverage test, both the simulation and real data results prove that the ANNs combinations

have superior forecast performance than the individual VaR models.

JEL classification: C22 C32 C45 C50

Key Words: Value-at-Risk, Artificial Neural Networks, Genetic Algorithm

∗

1

1 Introduction

There are three major types of risks in financial markets, Credit Risk, Liquidity Risk and Market

Risk. For trading purpose, market risk is the most important risk to be considered. Value at Risk

(VaR) was introduced by J.P.Morgan (1996) and has become the standard measure to quantify

market risk.

VaR is generally defined as the maximum possible loss for a given position or portfolio within

a known confidence interval over a specific time horizon. The measure can be used by the financial

institutions to assess their risks or by a regulatory committee to set margin requirements. Both

purposes will lead to the same VaR measure, even though the concepts are different. In other

words, VaR is used to help the financial institutions stay in business after a catastrophic event.

There are a variety of approaches to estimate the VaR. They range from parametric (Risk-

metrics, GARCH, etc.) to semi-parametric (Extreme Value Theory, CAViaR, etc.) and non-

parametric (Historical Simulation and its variants, etc.). In practice, people face the important

issue of how to choose the ”best” model among so many candidates. Since different methodologies

can yield different VaR measures for the same portfolio, there has to be some models leading to

significant error in risk measurement. The risk of choosing inappropriate model is called ”model

risk” and is an important question left to the risk manager. As the result, the VaR model com-

parison becomes a very important issue. Abundant literature concern about this problem, such

as Christoffersen etc. (1998, 2001, 2004), Sarma etc. (2003), Lopez(1998), etc. With so many

model criterions, the selection of VaR models becomes more complicated.

There are some difficulties for the VaR model selection.

First, Aiolfi and Timmermann (2004), Hendry and Clements(2002) indicate individual models

may be differently affected by non-statonarities such as structural breaks, volatility clustering. For

example, if there are some structural changes today and the volatility increases a lot, the Historical

Simulation (HS) would not produce a tomorrow’s VaR prediction much different from today’s

VaR prediction because HS is based on empirical quantiles. While GARCH model would results

a more volatile VaR prediction since GARCH model catches the volatility clustering property

of financial time series. However, GARCH would only be affected by the today’s increasing

volatility temporarily and the VaR prediction would go back to the previous level when the

volatility decreases. On the contrary, HS adapts the changes slowly, but is more stable and the

parameter estimation is more precise. In general, people could expect GARCH can have better

forecast performance in the short run, while HS has better chance to win the long run game.

2

In the real world, the structural break is very difficult to be detected in ”real time”, thus the

selection between HS and GARCH will be difficult.

Second, as stated by Clemen(1989), Stock and Watson(2001,2004), all the individual models

can be viewed as misspecified models. Thus the forecasting models are local approximations and

the same model is unlikely to dominate all others at all points in time. We can imagine the

best model will change all the time and we can hardly to track the best model based on past

performance. Chatfield(1995), Hoeting,etc.(1999) mentioned the ”model uncertainty” problem in

selecting multiple forecasting models. Here the uncertainty indicates the best model to minimize

the appropriate loss function. Our task is often to find the best VaR forecasting model which

minimizes the ”tick” loss function, not to find the unique correct conditional quantile model.

In fact, true model often has worse forecast ability than best misspecified model. Thus ”model

uncertainty” is an important issue in VaR model selection. Bao, Lee and Saltoglu(2004) showed

the forecasting performance of VaR models considered varies over three periods before, during

and after the crisis.

Due to the difficulty to select the best VaR models, in this work I will try to find the best

VaR model by combination, not by selection. The theory of combining forecasts was originally

developed by Bates and Granger (1969). From a theoretical viewpoint, forecast combination can

be seen as a method to pool the information set contained in the individual forecast models.

Combining the forecasts can diversify the selection risk like portfolio diversification theory. On

average, combination can absorb the different adaptability of VaR models and diversify the fore-

cast error uncertainty. Since our purpose is to find the best model, not to find the correct model,

combining the VaR model is accepted by practitioners. Tons of empirical literature supports the

forecast combinations in diverse areas such as forecasting GDP, inflation, stock price, city pop-

ulation, etc. Recent empirical work such as Stock and Watson(1999,2001) has further confirmed

the accuracy gains by forecast combination. Timmermann(2004) provides a nice survey paper

about forecast combination. However, little empirical work has been done for the conditional

quantile forecasting. Giacomini and Komunjer(2003) construct a Conditional Quantile Forecast

Encompassing test for the evaluation and combination of the Conditional Quantile Forecast. But

their work concentrates on the encompassing test and not on multiple forecasts combination. As

they state, the combination for VaR model is beneficial since VaR is a small coverage quantile

model which is sensitive for the few observations in the density tail. Combining from different

information set can make the forecast more robust.

In this work, I propose to combine the VaR models using the nonlinear Artificial Neural

Networks (ANNs). Linear combination and average combination are two special cases of ANNs

3

combination. Due to the use of asymmetric non-differentiable ’tick’ loss function, I apply Ge-

netic Algorithm (GA) to train the ANNs. Applications of simulated and real data support the

VaR combing methodology. A comparison between individual models and ANNs combinations is

provided.

The remainder of the paper is organized as follows: Section 2 describes the existing VaR

methodologies and their properties via Monte Carlo simulation. Section 3 introduces the ANNs

combination methods and the use of GA to train the ANNs. Section 4 introduce three backtesting

creterion to compare the VaR models. Section 5 compares the performance of individual models

and ANNs combination model using simulated data. Section 6 applies the VaR combination to the

real data and compare the performance with individual models. Section 7 concludes the paper.

4

2 VaR Models

Under a probabilistic framework, at the time index t, we are interested in the risk of a financial

position for the next lperiods. Let ∆V (l) be the asset value change from time t to t + l. This

quantity is measured in dollars and is a random variable at the time t. Denote the CDF of ∆V (l)

by Fl(x). We define the VaR of a long position over the time horizon l with probability p as

p = Pr[∆V (l) ≤ V aR] = Fl(V aR) (1)

For a long position, the loss occurs when ∆V (l) < 0, thus VaR defined in (1) is assumed

negative value because usually p is very small. Eq.(1) can be interpreted as the probability that

the holder would encounter a loss greater than or equal to VaR over the time horizon l is p.

For a short position, the holder suffers a loss when ∆V (l) > 0 . The VaR is defined as

p = Pr[∆V (l) ≥ V aR] = 1− Fl(V aR) (2)

The VaR of a short position is typically a positive value.

Usually the p is very small, thus the VaR is a story with tail behavior of the CDF Fl(x). For

a long position, the left tail of Fl(x) is important; for a short position, the right tail is important.

If the CDF is known, the VaR is simply its pth quantile. The CDF is unknown in practice, thus

studies of VaR are essentially concerned with estimation of the CDF, especially the tail behavior

of the CDF.

In general, the calculation of VaR involves the following factors:

1. probability p ;

2. time horizon l ;

3. CDF Fl(x) or its quantile.

It’s easily to recognize the CDF Fl(x) is the central part of the VaR modeling. Different

methods of VaR models are in fact different approaches to CDF estimation. In this paper, two

popular methods will be considered, GARCH approach and Historical Simulation.

2.1 GARCH Approach

Consider the log return rt of an asset, the mean and volatility equation for rt can be described as

the following GARCH process.

5

rt = φ0 +p∑

i=1

φirt−i −q∑

j=1

θjat−j + at, at = σtεt (3)

σ2t = α0 +

u∑

i=1

(αia2t−i) +

v∑

j=1

βjσ2t−j (4)

The above GARCH process can be used to obtain one period forecasts of conditional mean

and conditional variance. We denote them as r̂t(1) and σ̂2t (1). Using these two fitted value and

the pth quantile of the return’s standardized conditional distribution, we can obtain the VaR. For

example, if the εt is Gaussian, then

V aR0.05 = r̂t(1)− 1.65σ̂t(1) (5)

If εt is a Student-t distribution with v degrees of freedom, then

V aRp = r̂t(1)− tv(p)σ̂t(1)√v/(v − 2)

(6)

2.2 Historical Simulation

The Historical Simulation (HS) is very simple. First define the portfolio return on t + 1 as

Rp,t+1. The HS technique assumes that the distribution of tomorrow’s portfolio returns, Rp,t+1 , is

identical to the empirical distribution of the past m periods’ portfolio return, {Rp,t+1−τ}mτ=1. The

VaR with coverage rate p, is simplified as the 100pth percentile of the sequence {Rp,t+1−τ}mτ=1. In

practice, we sort the returns {Rp,t+1−τ}mτ=1 in ascending order, the 100p percentile of the sequence

is the V aRpp,t+1. If the VaR falls in between two observations, linear interpolation can be used.

2.3 VaR Properties of GARCH and HS

2.3.1 VaR Biases

To show the necessary of the VaR model combination, I conduct a simple Monte Carlo simulation

to show GARCH and HS produce significantly biased VaR forecasts. In the following numerical

experiment, I use t-distribution with degree of freedom DF. DF decides the tail behavior of the

distribution, the smaller the DF, the heavier the tail. Normal distribution corresponds to the

t-distribution with DF to infinity.

I create 5000 samples by simulating the t-distribution with DF of 1, 3, 5, 10, 50, 200, and

1000. Each sample includes 1000 observations. Then the VaR is estimated for each sample with

the method Historical simulation and GARCH(1,1). The VaR is calculated repeatedly for 5000

6

times. The coverage rate is set to 5%. The estimation is conducted with 100, 200, 500, 1000

observations. After the calculations, the bias is calculated as

Bias = V aR− V aR∗ (7)

where V aR∗ is the theoretical VaR which is the inverse function of t-distribution with coverage

rate 5%, V aR is the empirical mean of the simulation results. Table.1 shows the bias for different

sample size and different DF.

The results show that, in general, GARCH is subject to a significant negative bias; while

HS is subject to a significant positive bias, which concords with Inui, Kijima and Kitano(2003)’s

findings. For both GARCH and HS, the bias increases when distribution tail becomes heavier.

The bias of HS increases when sample size becomes smaller. However, GARCH doesn’t have such

characteristics. The different signs of biases for GARCH and HS impliy that the combination of

GARCH and HS might decrease the bias for the VaR estimation.

2.3.2 VaR Prediction under Structural Breaks

As stated in the introduction, GARCH and HS are expected to be differently affected by structural

breaks. In this section I conduct a Monte Carlo Simulation to show how they adapt the structural

breaks.

In the following numerical experiment, the data include two data generation process (DGP).

The first 1,000 observations are generated by the following GARCH(1,1) process,

rt = at

σ2t = 0.0188 + 0.259a2

t−i + 0.7217σ2t−j

at = σtεt, εt ∼ t(3)

The next 500 observations are generated by the following GARCH(1,1) process,

rt = at

σ2t = 0.0188 + 0.059a2

t−i + 0.9217σ2t−j

at = σtεt, εt ∼ t(10000)

The purpose of applying these two different DGP is create structural break which happens from

the 1, 001st observation. From the parameters in the above two DGPs, obviously the first 1,000

observations are less volatile and the VaR forecasts should be higher than the next 500 observa-

tions.

7

The first 500 observations are used to estimate the parameters of GARCH model, then the

parameter estimation is fixed and used to forecast the VaR of next 1,000 observations. Thus we

would have 1,000 GARCH VaR forecasts. The first 500 VaR forecasts are under the first DGP,

the last 500 VaR forecasts are under the second DGP.

The window size for the HS is also 500, since the whole sample size is 1,500, thus we would

also have 1,000 VaR forecasts.

This process is repeated 5,000 times and the resulting VaRs are taken average. The average

sample standard deviation of the first 1,000 observations is 0.4787, the average sample standard

deviation of the last 500 observations is 0.9257.

Figure.1 shows the behavior of GARCH and HS under the structural break environment. The

structural break happens from time t = 501. GARCH reacts with this structural break rapidly

and the VaR forecast by GARCH drops sharply at the beginning of the strctural break. However,

since the GARCH VaR forecast is influenced by the most recently return, thus the VaR forecast

comes back rapidly to the previous level. On the contrary, HS reacts smoothly and steadily

with the structural break. HS does not drops very fast, but it decreases steadily and reflects the

structural break in the long run.

The simulation results support that GARCH adapts the structural break rapidly but is less

stable, while HS adapts the structural change slowly but is more stable. Both have their advantage

and disadvantage and they can compensate each other by intuition, thus combination of GARCH

and HS might absorbs their advantage and diversifies their disadvantage, and finally results a

better VaR forecast.

8

3 Artificial Neural Networks Combination

From above discussion, we have seen that combining more than two VaR models is appealing.

Hornik, Stinchcombe and White (1989,1990) demonstrated that ANNs are able to approximate

arbitrarily well a large class of functions. Therefore ANNs are ideally suited to the problem of

forecast combining when the optimal combination of individual forecasts is potentially nonlinear.

Since GARCH and HS are different types of models, thus we have confidence that the potential

combination relationship is nonlinear. In this section I present how to apply Artificial Neural

Networks (ANNs) to conduct the nonlinearly combine the VaR models.

Donaldson and Kamstra (1996) applied the ANNs to combine time series forecasts of stock

market volatility. In this paper I will use ANNs to combine the VaR models which is essentially

a nonlinear quantile regression model.

3.1 ANNs Combination Architecture

An artificial neural network is a mathematical model for information processing based on a con-

nectionist approach to computation. ANNs are a wide class of flexible nonlinear regression.

In this paper the Multilayer Perception Network (MLP) is applied. MLP consists of an input

layer, several hidden layers, and an output layer. White (1992) shows MLPs are general-purpose,

flexible, nonlinear models that can approximate virtually any function to any desired degree of

accuracy if given enough hidden neurons and enough data. In other words, MLPs are universal

approximators.

To explain how the neural network works, I use a simple ANN structure which includes only

one hidden layers and one output. I will use the following notations,

xi = input values

αj = bias for hidden layer

hj = hidden neuron values

c = bias for output layer

y = predicted output value

t = target output value

wi = weight from input to hidden layer

g = activation function from input layer to hidden layer

dj = weight from hidden layer to output

f = activation function from hidden layer to output layer

9

The hidden neuron values and output values are calculated by the following nonlinear equations,

y = f(c + Σdjhj) (8)

hj = gj(aj + Σwixi) (9)

Given independent variables and dependent variable, an ANNs designer need determine the

number of hidden layers, number of hidden neurons and activation functions. White (1990) proved

that, provided a sufficient number of nodes are placed on the first layer of ANN, higher layers not

needed to establish a satisfactory connection between the initial raw inputs and the final output.

Thus in this paper I will adopt the single hidden layer ANN to combine the VaR models.

Figure. 2 shows the structure of the ANN combination models. {f1t, f2t} denote the individual

forecasts; {z1t, z2t} denote the normalized individual VaR forecasts; {h1t, h2t} denote the hidden

neurons; the number of hidden neurons may not be two and it will be determined by validation;

1 denotes the bias unit; Ft denotes nonlinear combining VaR forecast. The relationships among

the above variables are,

Ft = β0 + Σ2j=1βjhjt + Σ2

i=1δifit (10)

hjt = tanh(α0j + Σ2l=1αljzlt) (11)

zlt = (flt − flt)/Sflt(12)

tanh(x) =ex − e−x

ex + e−x(13)

flt = in-sample mean of flt

Sflt= in-sample standard deviation of flt

Equation (12) normalizes the individual VaR forecast to as the inputs for the hidden neurons.

Equation (11) transfers the linear combination of to hidden neurons via hyperbolic tangent ac-

tivation function tanh(x) which is defined in Equation (13). Finally Equation (10) transfers the

hidden neurons to the final output via linear transfer function.

When βj = 0, this nonlinear function will be identical to linear quantile regression combination.

Thus ANNs combination is the generalized form of all combination methods.

Theoretical results indicate that given enough hidden units, a network like the one in Fig. 1

can approximate any reasonable function to any required degree of accuracy. In other words, any

function can be expressed as a linear combination of tanh functions: tanh is a universal basis

function.

What we need estimate are the weights αlj , βj , δi and the number of hidden neurons n. They

are estimated by minimizing ’tick’ loss function defined in the next section.

10

In this paper candidate VaR forecasts are the input, the combined VaR forecast is the output.

The difference between the VaR ANNs model and standard ANNs is that the standard ANNs

deals in with mean forecast and the cost function is a symmetric differentiable function, while for

VaR models we are interested in the quantile forecast and the cost function is an asymmetric non-

differentiable ’tick’ loss function. Thus the Gradient-based Standard Backpropagation (SBP) is

not suitable to train the quantile ANNs. Here I will use Genetic Algorithm to train the VaR-ANNs

model.

3.2 Genetic Algorithm (GA)

Genetic Algorithm is a method for solving optimization problems that is based on natural selec-

tion, the process that drives biological evolution. GA was developed by Holland (1962, 1975).

Beasley et al. (1993) provides an excellent introduction for GA.

GA maintains an initial population of solution candidates and evaluates the quality of each

solution candidate according to a specific cost function. Then GA repeatedly modifies the popula-

tion of individual solutions. At each step, the genetic algorithm selects individuals at random from

the current population to be parents and uses them produce the children for the next generation.

Over successive generations, the population ”evolves” toward an optimal solution.

The following outline summarizes how the genetic algorithm works:

1. Creating an initial population, usually randomly, but can be specified by the designers.

2. The algorithm then creates a sequence of new populations, or generations. The individuals in

the current generation are used to create the next generation. the following steps are used,

1) Scores each member of the current population by computing its fitness value according to the

cost function.

2) Scales the raw fitness scores to convert them into a more usable range of values.

3) Selects parents based on their fitness.

4) Produces children from the parents.

Reproduction: With probabilities proportional to their fitness, members of the population are

selected for the new population.

Crossover: Pairs of chromosomes in the new population are chosen at random to exchange genetic

material, their bits, in a mating operation called crossover. This produces two new chromosomes

that replace the parents.

Mutation: Randomly chosen bits in the offspring are flipped, called mutation.

5) Replaces the current population with the children to form the next generation.

3. Repeat the above procedures until one of the stopping criteria is met.

11

GA can be used to solve a variety of optimization problems that can’t be solved by standard

gradient-based optimization algorithms, including problems in which the objective function is

discontinuous, non-differentiable, stochastic, or highly nonlinear. Considering the property of the

quantile ANNs, GA is used to train the ANNs.

12

4 Backtesting Criterion

In order to verify the reliability of VaR combination, I choose the following backtesting criterion

to compare the forecast performance among combining models and individual models.

4.1 Loss Functions

Basel Committee on Banking Supervision (1996a) indicates, the magnitudes as well as the number

of exceptions are a matter of regulatory concern. Lopez (1998) incorporated this concern into a

set of loss functions. The general form of those loss functions is defined as:

L =1n

n∑

i=1

Ct+i (14)

Ct+1 =

f(Lt+1, V aRt) if Lt+1 < V aRt

g(Lt+1, V aRt) if Lt+1 ≥ V aRt

(15)

where f(Lt+1, V aRt) ≥ g(Lt+1, V aRt). The following are the two common VaR loss functions

I will consider in this paper:

4.1.1 Binomial Loss

Ct+1 =

1 if Lt+1 < V aRt

0 if Lt+1 ≥ V aRt

(16)

This loss function considers only the number of exceptions, doesn’t consider magnitude. We

often call the mean loss value (7) calculated from this loss function as violation ratio.

4.1.2 ’tick’ Loss

Ct+1 =

(α− 1)(Lt+1 − V aRt) if Lt+1 < V aRt

α(Lt+1 − V aRt) if Lt+1 ≥ V aRt

(17)

13

in compact format, it can be expressed as Ct+1 = (α − I(Lt+1 − V aRt < 0))(Lt+1 − V aRt),

where I(Lt+1 − V aRt < 0) is an indicator function. We often call this loss function as ’tick’

or ’check’ loss function. In the quantile regression literature, ’tick’ loss function is usually the

implicit objective function.

In this paper I will apply loss functions (9) and (10) to compare the individual VaR models

and combining models. The mean loss is calculated by equation (7).

4.2 Conditional Coverage Test

The conditional coverage test is proposed by Christoffersen (1998). The analysis is based on

the indicator sequence Ct+1 defined in equation (9). Accurate VaR model should exhibit the

property of correct conditional coverage. It implies the indicator sequence Ct+1 should exhibit

both correct unconditional coverage and serial independence. Christoffersen (1998) develops a

three-step testing procedure.

4.2.1 Correct Unconditional Coverage Test

H0: correct violation ratio

Ha: incorrect violation ratio

LRuc = −2log

[pn1(1− p)n0

πn1(1− π)n0

]∼ χ2

(1) (18)

where

p = coverage rate for the VaR Model

n1 = number of exception

n0 = number of non-exception

π =n1

n0 + n1, the MLE of p

4.2.2 Independence Test

H0: exception process is independent

Ha: exception process is first-order Markov process

14

LRind = −2log

[(1− π2)n00+n11πn01+n11

2

(1− π01)n00πn0101 (1− π11)n10πn11

11

]∼ χ2

(1) (19)

where

nij = number value i follow by j in indicator sequence

π01 =n01

n00 + n01

π11 =n11

n10 + n11

π2 =n01 + n11

n00 + n01 + n10 + n11(20)

4.2.3 Correct Conditional Coverage Test

H0: independent exception process with correct violation ratio p

Ha: first-order Markov exception process with a different transition probability matrix (TPM)

LRind = −2log

[pn1(1− p)n0

(1− π01)n00πn0101 (1− π11)n10πn11

11

]∼ χ2

(2) (21)

15

5 Simulation Study

The purpose of this section is to design and perform a Monte Carlo simulation experiment to

compare the one-step ahead VaR forecast performance of ANNs combination model and individual

VaR models.

In the following numerical experiment, the data are generated by the following GARCH(1,1)

process,

rt = at

σ2t = 0.0188 + 0.059a2

t−i + 0.9217σ2t−j

at = σtεt, εt ∼ t(d.f.)

The parameters used here are estimated from S&P 500 index. The innovation process follows

t-distribution with degree of freedom (d.f.) varying from 3 to 1000. The sample size varies from

100 to 1000. The repetition of Monte Carlo Simulation is 1000.

The comparison procedure is described as following,

1. let d.f. = 3, sample size = 100; then generate GARCH(1,1) process by the above DGP, the

number of observations is 101, the first 100 data is used to estimate VaR, the last data is used

for out-of-sample comparison;

2. We estimate VaR by GARCH and HS with the first 100 observations;

3. Repeat the above two steps 1,000 times, we would have 1,000 VaR forecasts by both

GARCH and HS, and we also have 1,000 out-of-sample data;

4. Train the ANN and estimate the weights of ANNs with the 1,000 pairs of data from step

3; This model would be our ANNs combination model.

5. Repeat step 1, then estimate GARCH and HS VaR with generated data, also estimate the

ANNs VaR with weights from step 4;

6. Repeat step 5 1,000 times, would would have 1,000 VaR forecasts from GARCH, HS and

ANNs and 1,000 one-step-ahead out-of-sample data;

7. Compare the VaR forecast performance by violation ratio and ’tick’ loss;

8. Repeat all of above steps with different d.f. and sample size.

Table. 2 reports the violation ratio comparison between GARCH, HS and ANNs combination

model. The coverage rate is 5%, thus the model with violation ratio close to 5% is preferred.

From the table, we observe GARCH always generate the violation ratio less than 5%, which

16

means the VaR forecasts by GARCH are too conservative. This conservativeness becomes more

severe when the degree of freedom of the t-distribution of innovation increases. There are no

apparent pattern for the GARCH VaR foreacst with the change of sample size. HS VaR does

not have an obvious trend with the tail behavior of the innovation and the sample size, however,

we observe the performance of HS is very unstable. On the contrary, ANNs combination model

behaves much better than GARCH and HS, the performance is stable and there are no trend

between the violation ratio and the tail behavior of innovation and the sample size. Most of the

violation ratios are between 4% and 6% which are very close to 5%.

Figure. 2 compare the ’tick’ loss of the three models. Since ANNs model would definitely

has smaller loss than the individual models, this figure reports the out-of-sample loss comparison.

Obviously, ANNs have smaller loss under nearly all of the DGP except when the d.f. equal to 50.

This indicate the in-sample training is acceptable.

This Monte Carlo experiment prove that, if based on violation ratio and ’tick’ loss, ANNs

combination could generate better VaR forecasts than GARCH and HS.

17

6 Empirical Application

Four daily stock return series are employed to test the VaR models. They are S&P 500 INDEX,

DOW JONES Industrial Average Index (DJI ), Ford Motor Co. (Ford) and International Business

Machines Corp (IBM) - for the period from 2-Jan-1990 to 4-Apr-2005 with total 3846 observations.

Returns are computed as 100 times the difference of the log of the prices. The price data comes

from Yahoo Finance.

Table. 3 reports the summary statistics of these return series. S&P 500 has the highest mean

return and Ford has the worst performance during these fifteen years. Because S&P 500 and DJI

are composite indices, they have obviously smaller variances than Ford and IBM. All of these four

return series have negative skewness and Ford has a very large negative skewness. All of these

four return series have heavy tails, which is the foundation of using GARCH in this study.

6.1 Methodology

The individual VaR models are the Historical Simulation (HS), and GARCH(1,1) with innovation

process N(0,1). The VaR model window size is fixed as 1,000 days which is equal the four year

trading days.

The reason why I choose these individual VaR models is that they employ partially non-

overlapping information sets. HS only uses empirical distribution of historical return data, while

GARCH applies conditional volatility forecast model. Thus, there may be some advantage to

combine the VaR models since we could employ more information.

The HS and GARCH in this study are one-step-ahead forecast model beginning from the

1, 001st observations. Daily data from the 1st observation to 1, 000th observation are used to

estimate the parameters of GARCH model. The parameters of the GARCH model will stay fixed

in the following forecasting. Then we start to produce the VaR forecast for the 1, 001st day. We

update the data set using rolling window method which keep the sample size constant as 1,000, by

adding the 1, 001st observation and dropping the 1st observation. We then use the new data set

to produce the one-day-ahead out-of-sample forecasts for the 1, 002nd day. This rolling recursive

updating procedure is repeated until the last day of the whole sample. Then we have 3,846 - 1,000

= 2,846 individual VaR forecasts.

The above individual VaR forecasts constitute the base for the combination. I divide the

individual out-of-sample forecasts into two subsamples: 1st to 1, 000th and 1, 001st to 2, 846th.

The first subsample is used to select the optimal specifications for the ANNs models. Then the

optimal ANN architecture is used to forecast the VaR in the second subsample. I use the VaR

18

forecast in the second subsample to compare the performance between the ANNs combination

and individual VaR models based on the backtesting criterion described in section 4. The ANNs

specification is fixed in the backtesting period.

Genetic Algorithm is used to train the ANN because of the asymmetric non-differential ob-

jective function. There are some important parameters need be decided for the algorithm. The

population size (PS) is set as the 10 times of neurons. The Generations number is set as 200

which can achieve good convergence from the training experience. The crossover parameter (CP)

is set as 0.5 which is a common setting for GA training. The mutation parameter (MP) is set

as 0.8. The parameters are chosen to balance the power of the optimization and efficiency of the

computing time. People believe the GA training is some kind of an art.

6.2 Comparison Results

We are sure the optimal ANN combination will have better performance in the first validation

subsample described above since ANN combination nested all possible linear and nonlinear com-

bination of individual VaR models. Table. 4 reports the in-sample ’tick’ loss of GARCH, HS and

ANNs combination models. The results concord with our expectation, ANNs has the smallest

in-sample loss. Thus the in-sample training of the ANNs combination is valid. The important

comparison is the performance in the second testing subsample. It determines if the ANN com-

bination is of practically useful.

Table. 5 reports the 5% out-of-sample VaR comparison results. Four return series and 5%

coverage rate are tested. Three kinds of models are displayed in the table, where two individual

VaR models are listed first and ANNs combination model follows. I don’t report the linear

combination and average combination separately. As stated in the section 3.1, linear combination

and average combination are special cases of ANNs combination, thus they are compared with

other nonlinear models in the validation period. In fact it’s hard to believe the HS and GARCH

have linear relationship which can efficiently combine the information set of HS and GARCH.

Three comparison criterion are adopted which are described in section 4. Three backtesting

criterion are used to backtest the models. The first column reports the mean ’tick’ loss in the

testing sample, the second column reports the most common measure, violation ratio. From the

third column to the fifth column are the Christoffersen’s unconditional coverage test, independence

test and conditional coverage test. In the table, the best model for the mean loss test, violation

ratio test are marked with bold fond. The models passing the Christoffersen’s tests are expressed

in bold font.

19

The first column reports the mean ’tick’ loss comparison. The performance of GARCH and

HS is mixed, however, ANNs combination model has the smallest loss in all return series. This

indicates the ANNs combination could reduce the loss in both in-sample and out-of-sample data.

The second column shows the violation ratio comparison. Generally speaking, GARCH always

generates conservative VaR forecasts except for Ford. On the contrary, HS VaR forecasts are too

generous which consistently result the violation ratios larger than 5%. ANNs combination has

better performance than GARCH and HS, the deviation from the correct coverage is smaller than

GARCH and HS. However, we notice for the DJI and IBM, the deviation is not small enough.

Column 3 to column 5 reports the results of the conditional coverage test proposed by Christof-

fersen (1998). Column 3 is the results of the unconditional coverage test. Both GARCH and HS

fails this test for all four return series. ANNs combination passes this test for S&P 500 and Ford,

however, it fails to pass for DJI and IBM. The results concord with the violation ratio test since

both tests are essentially the same. For the independence test, column 4 tells us all of the models

pass the test. Thus the violation series from all three models have no serial correlation. The last

column is the result of conditional coverage test which is the joint test of unconditional coverage

test and the independence test. The statistics of this test is also the sum of previous two statistics.

Because of the unsatisfactory results from unconditional coverage test, both GARCH and HS fail

to pass this test. ANNs combination model pass this test for S&P 500 and Ford and fail to pass

the tests of DJI and IBM. We notice that the 95% critical value of chi-square with d.f. 2 is 5.99,

and the statistics of ANNs combination model for DJI and IBM are very close to the critical

value and much smaller than the statistics of GARCH and HS. Thus the ANNs combination

model improves the performance of the individual models for the conditional coverage test.

20

7 Concluding Remarks

The two popular VaR models, such as GARCH and HS, have two drawbacks. First, HS has

significant positive bias, while GARCH has significant negative biases. Second, if the structural

break appears, HS would respond too slowly and GARCH would respond fast but unstable.

In order to solve these two problems, this paper proposes an ANNs combination method and

studies performance of the ANNs combination. Based on the ’tick’ loss function, violation ratio

and Christoffersen’s conditional coverage tests, both the simulation results and the empirical

results show combining the HS and GARCH with ANNs significantly improves the VaR model

performance.

21

References

Aiolfi, M. and A. Timmermann (2004), Persistence in Forecasting Performance and Conditional

Combination Strategies. Forthcoming in Journal of Econometrics

Backus, Bao, Y., Lee, T.-H. and Saltoglu, B.(2004), A test for density forecast comparison with

applications to risk management. Department of Economics, UC Riverside.

Basel Committee on Banking Supervision (1996a), Amendment to the capital accord to incorpo-

rate market risks

Basel Committee on Banking Supervision (1996b), Supervisory framework for the use of ”back-

testing” in conjunction with the internal models approach to market risk capital requirements

Basel Committee on Banking Supervision (1998), Amendment to the Basel Capital Accord of

July 1988

Basel Committee on Banking Supervision (1999), Performance of Models-Based Capital Charges

for Market Risk: 1 July-31 December 1998

Bates, J.M. and C.W.J. Granger(1969).The combination of forecasts. Operations Research Quar-

terly, 20, 451-468

Beasley, D., Bull, D.R., and Martin, R.R. (1993), An Overview of Genetic Algorithms, University

Computing, 15(2) 58-69, 170-181

Blanco C. and G. Ihle (1999). How good is your VaR? Using backtesting to assess system perfor-

mance. Financial Engineering News

Caporin, M. (2003) Evaluating value-at-risk measures in presence of long memory conditional

volatility. GRETA, working paper n. 05.03.

Chatfield, C. (1995). Model uncertainty, data mining, and statistical inference, J. Roy. Statist.

Soc. Ser. A 158 419-466.

Christoffersen, P. (1998). Evaluating Interval Forecasts, International Economic Review, 1998,

Volume 39, 841-862.

Christoffersen, P., J. Hahn and A. Inoue (2001). Testing and Comparing Value-at-Risk Measures,

Journal of Empirical Finance, 2001, Volume 8, 325-342.

Christoffersen, P. and D.Pelletier (2004).Backtesting Value at Risk: A Duration-Based Approach,

Journal of Financial Econometrics, 2004, Volume 2, 84-108.

Clemen, R.T. (1989). Combining Forecasts: A Review and Annotated Bibliography,” Interna-

tional Journal of Forecasting, 5, (1989) 559-583

R.G. Donaldson and M. Kamstra, (1996). Using Artificial Neural Networks to Combine Forecasts,

Journal of Forecasting, 15, 49-61.

22

Engle, R. and S. Manganelli (1999). CAViaR: Conditional Value at Risk by Quantile Regression,

Manuscript, NYU Stern

Giacomini R., I. Komunjer (2004). Evaluation and Combination of Conditional Quantile Fore-

casts, forthcoming Journal of Business and Economic Statistics

Hendry, D.F. and M.P. Clements (2002). Pooling of Forecasts. Econometrics Journal 5, 1-26

Hoeting, J., D. Madigan, A. Raftery and C. Volinsky (1999). Bayesian Model Averaging, Statis-

tical Science 14, 382-417.

Holland, J. (1965), Universal spaces: A basis for studies of adaptation, In Automata Theory.

Caianiello, E. R. (ed.) Academic Press. 218-30.

Holland, J. (1975) Adaptation in Natural and Artificial Systems, Ann Arbor, The University of

Michigan Press

Hornik, K., M. Stinchcombe and H. White (1989), Multilayer Feedforward Networks are Universal

Approximators, Neural Networks, 2, 359-366

Hornik, K., M. Stinchcombe and H. White (1990), Universal Approximation of an Unknown

Mapping and Its Derivatives Using Multilayer Feedforward Networks, Neural Networks, 3,

551-560

Inui, K., M. Kijima, and A. Kitano (2003), VaR is subject to a significant positive bias. Mimeo,

Kyoto University and Financial Services Agency

Koenker, R. and B.J. Park (1996). An Interior Point Algorithm for Nonlinear Quantile Regression,

Journal of Econometrics, 71, 265-285

Lopez J. A. (1998), Methods for evaluating value-at-risk estimates. Federal Reserve Bank of New

York research paper n. 9802.

Riskmetrics (1996). Technical Document. Technical report. J.P. Morgan

Sarma, M., S. Thomas, and A. Shah (2003), Selection of Value at Risk models, Journal of Fore-

casting, 22(4), 337-358

Stock, J.H. and M. Watson (1999). Forecasting Inflation, Journal of Monetary Economics, 1999,

Vol. 44, no. 2.

Stock, J.H. and M. Watson (2001). A Comparison of Linear and Nonlinear Univariate Models for

Forecasting Macroeconomic Time Series, (with James Stock), in Cointegration, Causality, and

Forecasting A Festschrift in Honour of Clive W.J. Granger R.F. Engle and H. White (eds),

Oxford University Press

Stock, J.H. and M. Watson (2004). Combination Forecasts Of Output Growth In A Seven-Country

Data Set, forthcoming Journal of Forecasting, 2004

23

Timmermann, A. (2004). Forecast Combinations. Forthcoming in Handbook of Economic Fore-

casting (Edited by Elliott, Granger and Timmermann (North Holland)

White, H. (1990), Connectionist Nonparametric Regression: Multilayer Feedforward Networks

Can Learn Arbitrary Mappings, Neural Networks, 3, 535-549 .

White, H. (1992), Nonparametric Estimation of Conditional Quantiles Using Neural Networks, in

Proceedings of the Symposium on the Interface. New York: SpringerVerlag,

24

Figure 2. ANNs Combination Structure

1

1tz

2tz

1tf

2tf

1th

2th

1

tF

Figure 3. Loss Comparison

Table 1. GARCH and HS VaR Bias (5%)

GARCH Bias

df \ n 100 250 500 1000 3 -1.859 -1.829 -1.841 -1.800 10 -0.206 -0.204 -0.203 -0.213 50 -0.048 -0.054 -0.047 -0.041

1000 -0.034 -0.026 -0.023 -0.022 HS Bias

df \ n 100 250 500 1000

3 -0.449 -0.014 -0.033 0.020 10 0.220 0.057 0.019 0.017 50 0.046 0.028 0.037 0.012

1000 0.061 0.010 0.011 0.009 DGP: t-distribution with DF of 3,10,50 and 1000 MC repetition: 5000 Bias = forecast VaR - theoretical VaR forecast VaR = mean of VaR forecasts in the simulation theoretical VaR = 5% inverse CDF of t-distribution

Table 2. Violation Ratio Comparison (5%)

Method \ n 100 250 500 1000 df = 3 GARCH 0.016 0 0 0.035

HS 0.056 0.049 0.023 0.11 ANN 0.03 0.066 0.058 0.046

df = 10 GARCH 0.019 0.032 0.067 0.04 HS 0.017 0.053 0.084 0.072 ANN 0.03 0.067 0.046 0.098

df = 50 GARCH 0.038 0.035 0.042 0.032 HS 0.066 0.038 0.023 0.098 ANN 0.038 0.054 0.023 0.065

df = 1000 GARCH 0.019 0.032 0.067 0.04 HS 0.017 0.053 0.084 0.072 ANN 0.058 0.05 0.034 0.065 K = 0.018

DGP: GARCH(1,1) process GARCH = 0.9217 MC repetition: 1000 ARCH = 0.059

Table 3. Summary Statistics

Mean Median Variance Skewness Kurtosis S&P 0.052 0.040 1.047 -3.705 83.161 DJI 0.015 0.021 0.193 -0.222 7.524

Ford -0.025 0.000 5.408 -10.148 270.260 IBM 0.025 0.000 2.705 -0.881 25.048

Table 4. In-Sample Loss (5%)

S&P DJI IBM Ford GARCH 0.7674 0.8576 3.0632 2.5668 HS 0.7673 0.7949 2.6012 2.5481 ANN 0.7308 0.7575 2.5699 2.4960

Table 5. Out-of-Sample Performance (5%) Loss Vio Ratio LR_uc LR_ind LR_cc c.v. 3.842 3.842 5.992 SP GARCH 1.4267 0.029 10.867 0.030 10.955

HS 1.4542 0.096 35.511 0.007 35.719 ANN 1.3887 0.057 0.989 0.182 1.288

DJI GARCH 1.4036 0.031 8.739 0.002 8.804 HS 1.4359 0.085 21.512 3.241 24.931 ANN 1.3722 0.067 5.524 1.404 7.066

IBM GARCH 3.7766 0.030 9.769 1.858 11.687 HS 3.5180 0.078 14.204 0.154 14.521 ANN 3.4460 0.068 6.161 0.433 6.735

Ford GARCH 3.4888 0.074 10.634 0.051 10.838 HS 3.4794 0.070 7.530 0.931 8.606 ANN 3.4463 0.040 2.253 1.074 3.409

value-at-risk model combination using artiﬂcial …...value-at-risk model combination using...

Documents