robust term structure estimation - i-rml.org term structure estimation... · robust term structure...

Robust Term Structure Estimation

Robert Bliss ∗ Emrah Sener † Gunes Erdogan ‡ Emrah Ahi §

November 30, 2011

Abstract

Despite important advances in interest rate curve modeling in the last 30 years,

little attention has been paid to the key practical problem of robust estimation of the

associated parameters. In order to address this issue, we propose a hybrid Particle

Swarm Optimization (PSO) method to estimate the Nelson-Siegel-Svensson (NSS) yield

curve. We compare the results to that of some commonly used optimization algorithms,

each applied to bond portfolios of emerging markets (i.e. Brazil, Mexico and Turkey).

Our results show that the proposed hybrid PSO algorithm not only allows for a robust

estimation of the NSS parameters, but also provides a better yield curve fit in a short

period of time, and in turn, generates an economically interpretable parameter set.

Keywords: Term structure, yield curve, Nelson-Siegel-Svensson, Particle Swarm Opti-

mization

∗[email protected], Robert Bliss is Professor and F.M. Kirby Chair in Business Excellence, Wake ForestUniversity†[email protected], Emrah Sener is Director of Center for Computational Finance, Ozyegin

University‡[email protected], Gunes Erdogan is a Lecturer at School of Management, Southampton University§[email protected], Emrah Ahi is a PhD Student , Ozyegin University

1

1 Introduction

The literature is rich when it comes to various models for yield curves, whereas little attention

has been paid to the robust estimation of the associated model parameters. To address

this particular issue, we propose a hybrid Particle Swarm Optimization (PSO) method, and

estimate the Nelson-Siegel-Svensson (NSS) yield curve for emerging markets; Brazil, Mexico,

and Turkey.

Yield curve (the relationship between interest rates and term to maturity) is a fundemental

concept to not only economic and finance theory but also the pricing and risk management

of interest rate contingent claims. It has proved to be very critical to policy makers and to

market practitioners. Unfortunately, rarely in financial economics is the contrast between

theory and reality more severly visible than the study of the yield curve estimation. Slight

differences in the estimated yield curves may result in significant differences when pricing

bonds or fixed income portfolios. Therefore, the estimation problem should be thoroughly

examined despite its apparent simplicity. However, little attention has been paid to resolving

confliting parameter estimates from the various optimization algortihms. This paper seeks to

address this gap by proposing the use of a hybrid PSO method to estimate the parameters of

the Nelson-Siegel- Svensson exponential forward rate models. Our computational experiments

show that PSO outperforms the traditional methods in terms of goodness-of-fit, and provides

a stable set of parameters. PSO is also quite robust to the choice of initial starting values.

There are two distinct approaches to modelling the term structure of interest rates.1 The

first approach is based on curve-fitting techniques — a direct specification of the bond prices

as a function of some parameters and the time to maturity (the cross-sectional dimension).

The second approach is based on models which make explicit assumptions about the dynamics

of state variables (the time-series dimension) and asset pricing methods using either equilib-

rium or arbitrage arguments, which in turn result in cross-sectional models for bond prices.2

Furthermore, the first-type of models can be classified into two groups: (i) spline-based

models and (ii) function-based models. The choice between them will be dictated by the

trade-off between the closeness of fit to the set of observed government coupon prices and the

smoothness of the corresponding zero-coupon rate function.

Within the class of the function based models Nelson and Siegel (1987) first used an

exponential polynomial functional form for the forward rate curve. Svensson (1994) extended

1For an excellent comparison of yield curve estimation techniques, see Ioannides (2003). Further discussionon the yield curve can be found in Anderson and Murphy (1996).

2Within the second approach interest rate models based on short rate models either imply a strcucture forthe shape of the term structure such as Cox et al. (1985) or Vasicek (1977) or take the entire term structureas given and provide structure for its evolution such as Ho and Lee (1986) and Heath et al. (1992).

2

Nelson and Siegel’s function by adding a fourth term with two additional parameters. The

addition of this fourth parameter aimed at increasing the flexibility and improving the fit.

From a technical point of view, the extension creates a second hump-shape (or U-shape). For

a number of reasons this approach has become one of the most popular yield curve models.

Firstly, a number of studies such as Bliss and Ritchken (1996); Bliss (1997) and Seppala and

Viertio (1996) conclude that practitioners requiring a reliable and parsimonious representation

of the yield curve should use an exponential polynomial approach, in preference to the spline-

based approaches. The latter results in the over-fitting problem described in Bliss (1997): ”...If

the term structure being used is ”too flexible” the model will almost certainly be incorporating

unwanted measurement error or idiosyncratic, bond-specific factors of no relevance to pricing

securities in other markets...” It has been shown that fitting the zero-coupon yield curve

is better than fitting the discount function as this will eliminate coupon effects that exist

in the latter case. Secondly, since asymptotically flat exponential approximating functions

are used, the resulting forward rate function is infinitely differentiable.3 Thirdly, financial

economists have argued that the functional form chosen for the forward rate curve of Nelson

and Siegel (1987) is effectively a dynamic three-factor model of level, slope, and curvature

(Diebold and Li, 2006). In this respect, since it requires fewer parameters, it allows for a

clearer interpretation of the parameters.

The NSS models are popular among central bankers and practitioners. They don’t give

only a good estimate of the yield curve but also provide estimates for the parameters which

are interpreted as level, slope, and curvature. In creating bond portfolios such as butterfly and

ladder, these parameters are invaluable for practitioners hedging the risk of bond portfolios.

Also, parametric form of bond relative value models are becoming increasingly popular in the

investment community because they are more suitable for cheap/rich analysis since this class

of models focuses on the actual cash flows of the underlying securities rather than using the

yield-to-maturity measure, which is subject to a number of shortcomings (Martellini et al.,

2003).

However, estimating the parameters of the NSS models is a difficult task because of the

non-linear nature of the yield curve that expose itself in complex shapes where there appear to

be multiple local minima (or maxima) in addition to a global minimum (or global maximum).

A number of studies (Bolder and Streliski, 1999; Gurkaynak et al., 2007; De Pooter, 2007),

3The Nelson and Siegel (1987) and Svensson (1994) model imply that forward rates smoothly gravitatetowards a flattened long end whereas the McCulloch (1971) yield curve allows forward rates to fluctuate withmaturity and rise steeply as the term to maturity lengthens. But what if the forward rates reflect expectedfuture rates? In other words, the NSS models, by constraining the implied forward rate curve, are implicitlypromoting a more stable yield curve with reference to how the ’true’ term structure is expected to behave.The main question is at what cost in terms of accuracy of the fit is this stability achieved. For this discussion,see Shea (1985).

3

have reported numerical difficulties when working with the NSS model. The traditional

methods of direct search, gradient based, and quasi-Newton algorithms, which may be used

for solving the associated optimization problem, carry the risk of numerical problems of false

convergence and severe suboptimality (Bolder and Streliski, 1999). As explained in Bolder

and Streliski (1999), using the standard estimation methods to obtain the global optimum,

it is necessary to estimate the model for many different sets of starting values for the model

parameters. For example, estimating the six parameters of the Svensson model, with five

different starting values for each parameter requires a grid size of 56 = 15, 625 — all possible

combinations of five different starting parameter values for each of the six parameters. The

time required to increase the robustness of an estimated curve, or the confidence of having a

global optimum, increases exponentially with the number of different starting values chosen

for each parameter. Time-wise instability of the NSS parameters due to the existence of

multiple local minima (known as false convergence) in the associated optimization problem

is one such issue that arises frequently. These numerical problems may also arise from the

high sensitivity of the optimization algorithm to the initial starting values which can cause

great variability in the estimated parameters. Since NSS parameters are estimated on a daily

basis and these parameters have economic meaning, wild variations in these parameters will

be strange. However, little attention has been given to algorithms used in the estimation of

the parameters of the NSS.

In the original work of Nelson and Siegel (1987), it is recommended to solve the optimiza-

tion problem over an empirically predetermined set of values of the parameters causing nonlin-

earity. Ronn (1987) uses linear programming, Csajbok (1998) uses Gauss-Newton algorithm,

whereas Bolder and Streliski (1999) adopts the use of sequential quadratic programming as

well as Nelder-Mead Simplex Method to estimate the NSS parameters. Ioannides (2003) has

compared various yield curve models using the BFGS quasi-Newton algorithm. Diebold and

Li (2006) have tackled the estimation problem by fixing the parameters causing nonlinearity,

and solving the resulting linear subproblem by least squares. Recently, Manousopoulos and

Michalopoulosa (2007) have compared a diversity of non-linear optimization algorithms for

estimating the yield curve for the Greek bond market and suggested a two-step optimization

process to estimate the NSS model parameters: first to estimate the NSS parameters with di-

rect search and global optimization algorithms and then refine the estimated paremeters with

gradient based methods. In this study, we propose a hybrid Particle Swarm Optimization

(PSO) method for estimating the parameters of the NSS yield curve model.

Four performance criteria are used for evaluating and comparing the optimization algo-

rithms tested. First is goodness of fit which shows how well the suggested algorithm fits

empirically observed yield curve. Since NSS parameters need to be calculated on a daily basis

4

and this computational time can be quite expensive for long time series, we use computational

time as a second performance measure. Thirdly, we investigate the parameter space scanned

by each algorithm around the starting point. An algorithm that scans a large space is prefered

because the algorithms that exploit smaller space around the starting point are very sensitive

to starting points of the parameters. Finally, and most importantly, we check the parameter

stability over the time. Past researches have mainly focused on the performance of the yield

curve fit and have paid scant attention to parameter stability of NSS model. Knowing that

NSS parameters have a specific financial interpretation, any robust optimization algorithm

is expected to generate smooth parameters over the time. Our computational comparisons

with the existing methods show that the PSO algorithm can provide a robust framework

that can handle the numerical problems reported in the estimation of NSS parameters. PSO

yields smaller errors while fitting the yield curve with low computational time and smoother

parameters as well as increasing the stability of the estimated parameters. Also, in contrast

to traditional optimization algorithms employed in the finance literature, PSO algorithm has

the advantage of not being dependent on the initial values, which minimizes the risk of non-

convergence.

The rest of the paper is organized as follows: In Section 2, we discuss the framework of

the NSS model. Section 3 describes the estimation procedure and the related optimization

algorithm. Section 4 provides the description of the optimization algorithms we use in our

computational experiments. In Section 5, we give the details of our computational experiments

and results. Our concluding remarks are provided in Section 6.

2 Yield Curves

The yield curve is not directly observable in the market and it needs to be estimated from the

bond prices. In this section we first discuss the basic yield curve concepts, then we show an

important relationship between spot rate and forward rates which is the underlying rationale

of the NSS model.

2.1 Yield curve concepts

Consider a bond which is sold now, at time t, is due to mature at time T . Let t ≤ T < T ,

where T is the trading horizon, much longer than the maturity date of any bond. If the bond

is a zero-coupon bond with maturity M = T − t, its price at time t, p(t, T ), is calculated using

the yield to maturity y(t, T ) as

p(t, T ) = e−y(t,T )(T−t),

5

with the properties, p(T, T ) = 1, p(t, T ) < 1 for t < T.

It is trivial to show that the bond price can be transformed to give the yield curve,

y(t, T ) = − ln(p(t, T ))/(T − t).Since many governments do not issue longer term zero-coupon bonds, constructing yield

curves from the market data requires coupon bond price (or yield). A coupon bond may be

priced in a number of ways. The traditional procedure is to discount all cash flows of the

bond by, y(t, T ).

p(t, T ) =T−t∑m=1

ct+m d(t, T ) =T−t∑m=1

ct+me−y(t,T )t,

where p(t, T ) is the price of a T period bond when the yield to maturity is y(t, T ), ct+m is

the coupon payment at time t+m with cT = F being as the bond’s face value and d(t, T ) is

the discount factor that gives the price of default-free zero-coupon bond that pays one unit

at time T .

A coupon bond is, in effect, a bundle of zero-coupon bonds with each coupon payment

constituting a single zero-coupon bond. Consequently, if the prices of discount bonds maturing

at each coupon date is known, then it is easy to find the price of coupon paying bond using

the constituent zero-coupon rates:

p(t, T ) =T∑m=t

cm d(t, T ) =T∑m=t

cme[−r(t,m)(m−t)], (1)

where r(t,m) is the spot rate applicable to a term of m periods. The yield curve at a given

date t is represented by a graph of the spot rate r(t,m) for different times to maturity m− t.The discount function4 can easily be transformed to be presented as a forward rate. More-

over, the transformation between the spot interest rate and the forward rate is unique for a

given yield curve. Let f(t,m, T ) be the continuously compounded forward rate on a forward

contract concluded at time t (the trade date), for an investment that starts at time m > t

(the settlement date), and ends at time T > m (the maturity date). Then the forward rate is

related to the spot rates according to period forward rate:

f(t,m, T ) =(T − t)r(t, T )− (m− t)r(t,m)

T −m. (2)

This forward rate Eq.(2) can easily be amended to produce an instantaneous forward rate by

4The Fundemantal Theorem of Asset pricing (see Dybvig and Ross (1989)) states that absence of arbitrageimplies the existance of a linear pricing rule which corresponds to unique ’discount function’ in the termstructure literature. However as Dermondy and Prisman (1988) have argued because bond prices are quotedwith a spread PBid < CDisc < Pask multiple discount function is possible.

6

taking periods m and m− 1 infinitesimally close together

f(t,m) = limh→0

f(t,m,m+ h) = r(t,m)− (m− t)∂r(t,m+ h)

∂m

Integrating this function results in the spot rate:

r(t, T ) =1

T − t

∫ T

t

f(t,m)dm (3)

which implies that the spot rate is an average of forward rates. This relationship between

spot rate and forward rate lies at the heart of the NSS model.

2.2 Nelson-Siegel and Svensson models

Nelson and Siegel (1987) assume that the instantaneous forward rate is the solution of a

second-order differential equation with two equal roots. Nelson and Siegel’s forward rate

function, for some fixed time t (i.e. cross-sectional) is given by

f(m,Θ) = β0 + β1e

(mτ1

)+ β2

[(m

τ1

)e

(mτ1

)](4)

where Θ = [β0, β1, β2, τ1] are parameters to be estimated. This function can be divided

into two functional components: (i) a simple exponential function, g(m) = β0 + β1e

(mτ1

)for

β0, β1, τ1 ∈ R with β0 > 0 and (ii) a hump-shaped function, h(m) = β2

[(mτ1

)e

(mτ1

)]for

β2, τ1 ∈ R with τ1 > 0.

The β0 parameter anchors g at a given level, while the sign of β1 determines the slope of the

instantaneous forward curve. The parameter β1 generally takes a negative value, producing

an upward-sloping shape for the forward rate curve. A large (small) value of τ1 means that

this exponential effect decays more slowly (quickly). The function h(t) adds some additional

flexibility to permit the instantaneous forward-rate curve to take a number of different shapes.

This component creates a hump shape when β2 is positive, while it generates a U-shape when

negative, as can be seen in figure 1. The parameter τ1 controls the speed of convergence of

the third term in Eq.(4), as does τ1 for the second component. Finally, an important feature

of Eq. (4) is that the limits of forward and spot rates when maturity approaches zero and

infinity, respectively, are equal to β0 + β1 and β0. The parameters of this model have clear

financial meanings: the long term forward rate is represented by β0 (level), the short term

rate is modulated by β1 (slope), and medium term rate is governed by β2 (curvature).

7

Svensson (1994) extends the work of Nelson and Siegel (1987) by repeating the function

h(m) twice, with different parameters, and linearly combining the functions g(m) and h(m)

into a single function for the instantaneous forward-rate curve:

f(m,Θ) = β0 + β1e

(mτ1

)+ β2

[(m

τ1

)e

(mτ1

)]+ β3

[(m

τ2

)e

(mτ2

)](5)

where Θ = [β0, β1, β2, β3, τ1, τ2] are parameters to be estimated. The term β3

[(mτ2

)e

(mτ2

)]for

β3, τ2 ∈ R with τ2 > 0 now gives rise to a second hump-shape (or U-shape), increasing the

flexibility and improving the fit. Figure 1 shows the decomposition of the NSS curve into its

exponential components.

Figure 1: NSS Components

3 Estimation and Optimization Problem

We now focus on the problem of determining the parameters of the NSS model. First we need

to transform Eq. (5) forward-rate curve into a discount function to price the set of coupon

bonds. Once we have a theoretical price vector, we can optimize over the parameter set to

minimize the distance between the observed bond prices (obtained from the market) and the

theoretical bond prices (estimated using the yield curve).

8

3.1 Estimation

To transform f(m,Θ) into a discount function, we use the result from Eq. (3). In other

words, the spot rate required in the discount function, Eq. (1) can be derived by integrating

the forward rate according to Eq. (3):

r(m,Θ) = β0 + β11− e(−m/τ1)

m/τ1

+ β2

(1− e(−m/τ1)

m/τ1

− e(−m/τ1)

)+ β3

(1− e(−m/τ2)

m/τ2

− e(−m/τ2)

)(6)

where Θ is the set of the model parameters [β0, β1, β2, β3, τ1, τ2] and m is the time to maturity.

Consequently, the present value relation is specified using the continuous form of the discount

function given by:

d(m,Θ) = e−r(m,Θ)m.

Consider a set of N bonds. Let pi(t,m) be the estimated price of bond i at time t, d(mij,Θ)

be the discount factor that gives the price of zero-coupon bond that pays one unit at time

m, cij be its jth payment, paid at time mij (i.e., cij = ci(mij)), and mi be the number of

payments. Then, Eq. (1) is rewritten for bond i as

pi(m,Θ) =

mi∑j=1

cijd(mij,Θ) (7)

.

To handle bond-specific errors due to market frictions (such as tax effects and liquidity

problems) an error term is added to Eq. (7) and the resulting expression is estimated as

cross-sectional regression with all the bonds outstanding at a particular date:

pi = pi + ui =

mi∑j=1

cijd(mij,Θ) + ui i ∈ {1, ..., N} (8)

The regressors are coupon payments at different dates, and the coefficients are the discount

zero-coupon bond prices d(mij,Θ), j ∈ {1, ...,M}, where M = maximi is the longest coupon

bond maturity.

9

3.2 Optimization Problem

The parameter vector Θ must be chosen to minimize the weighted sum of squared price errors

ui = pi − pi:

minΘJ(Θ) = min

Θ

[N∑i=1

(wiui)2

](9)

where J(Θ) is the objective function, wi is the appropriate weighting scheme:

wi = (1/Di)/

[M∑m=1

(1/Dm)

],

where Di is the duration of the bond.5

Note that in this case the squared errors are weighted with the inverse of the bond’s

duration Di. Parameters estimated under such loss functions are quadratic in nature, and are

sensitive to outliers. To reduce the impact of outliers on the parameter estimates, we specify a

robust loss function by weighting each price error by the inverse of its duration. This approach,

suggested by Bolder and Streliski (1999), downweights large errors in the objective function.

With this definition of the associated optimization problem, the estimation procedure

followed can be described as follows.

5Nelson and Schaefer (1983) have tested various duration based models and found that duration capturesthe most of the interest rate risk and suggested to use duration weighting rather than maturity weighting.

10

Estimation ProcedureThe steps followed in the estimation procedure are as follows:

1. Starting parameters ( Θ(0) = (β(0)0 , β

(0)1 , β

(0)2 , β

(0)3 , τ

(0)1 , τ

(0)2 ) ) are computed via

least squares as suggested by Diebold and Li (2006).

2. The discount factor function is determined using the starting parameters Θ(0),

i.e., d(m,Θ(0)) = e[−r(m,Θ(0))m].

3. The discount function is utilized to determine the present value of the bond

cash flows and thereby to determine a vector of starting ’model’ bond prices,

i.e., p(0)i =

∑mij=1 cijd(mij,Θ

(0)) i ∈ {1, ..., N} .

4. Numerical optimization procedures are used forestimating a set of parameters

that minimize the specified loss function,

i.e., minΘ J(Θ) with the constraints

c1(β0) = −β0 ≤ 0 (10)

c2(β0, β1) = −(β0 + β1) ≤ 0 (11)

c3(τ1) = −τ1 ≤ 0 (12)

c4(τ2) = −τ2 ≤ 0 (13)

5. The estimated set of parameters is used to determine the spot rate function

and therefore the ‘model’ prices.

The objective function J(Θ) is non-linear and has been observed to have multiple local

minima (Manousopoulos and Michalopoulosa, 2007). Thus, we need a robust algorithm that

will converge to a high quality solution, regardless of the initial solution values. In the next

section we describe well-known optimization algorithms from the literature as well as the PSO,

which we will utilize to solve the minimization problem stated above.

4 Optimization Algorithms

To comparatively evaluate the performance of the PSO algorithm, we implemented well-known

numerical optimization algorithms from the literature. Our focus in this paper is not on which

optimization algorithm ’wins the horse’ but on how one could solve the numerical problems

reported in the literature. This paper demonstrates that the methodological differences among

11

the algorithms may well lead to errors in the fit of the yield curve and instability of param-

eters estimated that are material. Therefore the selected algorithms is classified into three

categories: global optimization algorithms, direct search algorithms and gradient based al-

gorithms. We discuss all these algorithms in detail to discuss why PSO algorithm is better

suited to solve the numerical difficulties reported in NSS model.

4.1 Global Optimization Algorithms

The objective of global optimization is to find the globally best solution of (possibly non-linear)

models, in the presence of multiple local optima.

4.1.1 Particle Swarm Optimization

PSO is a population-based metaheuristic technique where the gradient of the objective func-

tion is not required. It is a fast converging algorithm, is easy to implement and recently has

been successfully applied to optimizing various continuous nonlinear functions in practice.

(Clerc and Kennedy, 2002; Trelea, 2003; Pedersen and A., 2010; Yang et al., 2007; Abd-El-

Wahed et al., 2011) PSO is essentially inspired from the social behaviour of the individuals.

In a simple social setting, decision process of each individual (particle) is affected by his own

experiences and other individual’s experiences. From their own experiences they know how

good each available choice they have tried so far. From other individual’s experiences not only

they know which choices are most positive but also they know how positive the best pattern

of choices was. In our particle swarm optimization setting, a set of particles search for good

solutions to a given NSS continuous optimization problem that we described in section 3.2.

Each particle is a solution of the NSS optimization problem and uses its own experience and

also the experience of neighbour particles to choose how to move in the search space to find

a better yield curve fit.

The PSO algorithm is initialized with a population of random candidate solutions, called

particles. Each particle is assigned a random location and a random velocity, and is iteratively

moved through the problem space. Every particle is attracted towards the location of the best

solution achieved by the particle itself and towards the location of the best solution achieved

across the whole population. At each iteration velocity vector (vi) and position vector (Θi) of

a particle is updated as follows:

vi ← vi + U(0, φ1)⊗ (pbi −Θi) + U(0, φ2)⊗ (gb−Θi) (14)

Θi ← Θi + vi (15)

12

where U(0, φi) is a vector of uniformly distributed random numbers in the interval [0, φi],

⊗ is the entry-wise product, pbi is the best known position of particle i, and gb is the best

known position of entire population. The parameters φ1 and φ2 denote the magnitude of

the random forces in the direction of personal best pbi and swarm best gb. The components

U(0, φ1)× (pbi−Θi) and U(0, φ2)× (gb−Θi) can be interpreted as attractive forces produced

by springs of random stiffness. There are certain variants of the original PSO algorithm, which

have been reported to perform better on a number of problems. As one such variant we call

PSO-W, the following update was proposed by Shi and Eberhart (1998):

vi ← ωvi + U(0, φ1)⊗ (pbi −Θi) + U(0, φ2)⊗ (gb−Θi) (16)

Θi ← Θi + vi (17)

ω ← ωp × ω (18)

where ω is termed as inertia weight and 0 < ωp < 1 is the decay factor. Effectively, the inertia

weight supplies a random search direction, the magnitude of which decreases over iterations.

Another variant of PSO was proposed by Pedersen and A. (2010). This variant, named as

PSO-G, disregards the personal best values and focuses on the neighborhood of the population

best. PSO-G has been reported to perform better on a number of problems (Pedersen and A.,

2010). Our computational experiments show that the best results were obtained when PSO-W

and PSO-G were applied in conjunction. This method will be referred to as PSO-W/G for

the rest of the study.

PSO has been designed an unconstrained optimization algorithm, which required us to

handle the constraints in the objective function. To this end we added a penalty function that

is a scalar value times the square of the violation of constraint ci to the objective i.e

Pi(Θ) =

0 , ci(Θ) ≤ 0 ,

C(ci(Θ))2, ci(Θ) > 0(19)

where C is large scalar value. Thus, the optimization problem becomes:

minΘf(Θ) = J(Θ) +

4∑k=1

Pk(Θ)) (20)

13

Hybrid Particle Swarm Optimization AlgorithmInitialization. Initialize every particle i s.t. Θ

(i)0 = U(0, φi) and v

(i)0 = U(0, φi), as

well as ωp, kg, kmax and stopping criteria ε.

Step 1. Update Θ(i)k and v

(i)k :

Θ(i)k ← Θ

(i)k−1 + v

(i)k−1

ω ← ωp × ω

If k < kg then the algorithm is PSO-W,

v(i)k ← ωv

(i)k−1 + (U(0, φ1)× (pbi −Θi)) + U(0, φ2)× (gb−Θi)

If k > kg then the algorithm is PSO-G,

v(i)k ← ωv

(i)k−1 + U(0, φ2)× (gb−Θi)

Step 2. If f(Θ(i)k ) < f(pbi), pbi ← Θ

(i)k If f(pbi) < f(gb), gb← pbi

Step 3. If k > kmax or ||Θ(i)k −Θ

(i)k+1|| < ε(1 + ||Θ(i)

k ||) STOP and report success, else

increment k and go to step 1.

4.1.2 Simulated Annealing

Simulated Annealing (SA) algorithm, proposed by Kirkpatrick et al. (1983), is a probabilistic

local search algorithm that looks for the minimum of an objective function using the neighbor-

hood information of a point in the search space. The name and inspiration of the algorithm

come from annealing in metallurgy, a technique involving heating and controlled cooling of a

material to increase the size of its crystals and reduce their defects. The heat causes the atoms

to become unstuck from their initial positions (a local minimum of the internal energy) and

wander randomly through states of higher energy; the slow cooling gives them more chances

of finding configurations with lower internal energy than the initial one. By analogy with this

physical process, each step of the SA algorithm replaces the current solution by a random

”nearby” solution, chosen with a probability that depends both on the difference between the

corresponding function values and also on a global parameter T (called the temperature), that

is gradually decreased during the process.

Local search algorithms usually start with a random initial solution. A neighbour of this

solution is then generated by some suitable mechanism and the change in cost is calculated.

If a reduction in cost is found, the current solution is replaced by the generated neighbour,

otherwise the current solution is retained. The process is then repeated until no further

14

improvement can be found in the neighbourhood of the current solution and the algorithm

terminates at a local minimum. In SA, sometimes a neighbour that increases the cost is

accepted, to avoid becoming trapped in a local optimum. A nonimproving move is accepted

with a probability that decreases with iterations. Usually, the probability is selected as e−δ/T

where δ is the increase in the objective function value at each iteration and T is a control

parameter.

Simulated Annealing AlgorithmInitialization. Choose initial guess Θ0, fbest = f(Θ0) and T .

Step 1. Set δ = f(Θk)− f(Θk−1).

Step 2. If random() < e−δ/T then Θk = neighbor(Θk)

Step 3. If f(Θk) < fbest then fbest = f(Θk)

Step 4. If k > kmax STOP, else increment k and go to step 1.

4.2 Direct Search Algorithms

Direct search is a method for solving optimization problems that does not require any infor-

mation about the gradient of the objective function. Direct search algorithms search a set of

points around the current point, looking for one where the value of the objective function is

lower than the value at the current point.

4.2.1 Nelder-Mead Method

Nelder-Mead method is a simplex-based direct search method begins with a set of points that

are considered as the vertices of a simplex. At each step, Nelder-Mead generates a new test

position by extrapolating the behavior of the objective function measured at each test point

arranged as a simplex. The algorithm then chooses to replace one of these test points with

the new test point and so the technique progresses.

The Nelder-Mead method (Nelder and Mead, 1965; Lagarias et al., 1965) uses four scalar

parameters: ρ (reflection), χ (expansion), γ (contraction), σ (shrinkage). Let the vertices are

denoted as Θ(1),Θ(2), ...Θ(n+1) where n is the number of parameters to be estimated.

15

Algorithm for Nelder-Mead’s methodInitialization. Choose vertices Θ(1),Θ(2), ...Θ(n+1) and parameters ρ, χ, γ, σ which

satisfy ρ > 0, χ > 1, χ > ρ, 0 < γ < 1, 0 < σ < 1

Step 1. Sort the vertices as f(Θ(1))k < f(Θ(2))k < ... < f(Θ(n+1))k

Step 2. Θrk = Θk + ρΘk − Θ

(n+1)k where Θ =

N∑i=1

Θ(i)

nIf f(Θ

(1)k ) ≤ f(Θr

k) ≤ f(Θ(n)k )

then the reflection point is accepted else go to step 3

Step 3. Θek = Θk + χ(Θr

k −Θ(n+1)k ) If fe < fr the expansion point is accepted else go

to step 4

Step 4. Θck = Θk + γ(Θr

k − Θk). If fc ≤ fr then the contraction point is accepted

else go to step 5

Step 5.{

Θ(1)k ,Θ

(2)k , ...,Θ

(n+1)k

}={

Θ(1)k , v

(2)k , ..., v

(n+1)k

}where v

(i)k = Θ

(1)k + σ(Θ

(i)k −

Θ(1)k )


(i)k+1|| < ε(1 + ||Θ(i)



4.2.2 Powell’s Method

Powell’s algorithm (Powell, 1964; Press et al., 1992) performs successive line minimizations

along conjugate directions until it converges to a solution. Specifically, the minimization is

performed in groups of mutually conjugate directions, which are selected without using any

gradient information. The directions are the n standard base vectors where n is the number

of variables. Main advantage is its simplicity and effectiveness in practise.

Algorithm for Powell’s methodInitialization. Choose initial guess Θ0 and stopping parameters δ and ε > 0.

Step 1. For i = 1, ..., n find λi that minimizes f(Θik +λiui) where ui = (0, ..., 1i, ..., 0)

Step 2. uj+1 = uj for j = 1, ..., n− 1 and un = Θn −Θ0

Step 3. Θ(i)k+1 = Θ

(i)k + λiuk for i = 1, ..., n

Step 4. uk+1 = λuk


(i)k+1|| < ε(1 + ||Θ(i)



16

4.3 Gradient Based Algorithms

Gradient based optimization algorithms iteratively search for a minimum by computing (or

approximating) the gradient of the NSS function at each iteration since the functional form of

NSS allows it to be differentiable to infinite order. We use two popular and efficient gradient

based optimization algorithms namely Broyden-Fletcher-Goldfarb-Shanno Algorithm (BFGS)

and Generalized Reduced Gradient (GRG). One of the key advantage of these algorithms

compared to Global and Direct serach algorithms is their theoretical capability of using the

geometry of NSS parameters space via its function gradient information. Thus they have

better chance to converge to a minumum and provide a better yield curve fit. However, as

they are forced to explore a small space around the starting point they become quite sensitive

to initial starting point of the NSS parameters. As we discussed in the results section, this

turns out to be one of the key reasons of their poor performance.

4.3.1 Broyden-Fletcher-Goldfarb-Shanno Algorithm

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (Judd, 1998; Press et al., 1992) is a

quasi-Newton algorithm which is an iterative procedure with a local quadratic approximation

of the function. An approximation of the Hessian of the NSS function is used instead of the

funstion itself, which decreases the complexity of the algorithm. As all variants of Newton’s

methods, the idea behind them is to start at a point Θ0 and find the quadratic polynomial

J0(Θ) that approximates J(Θ) to its second degree Taylor expansion at Θ0:

J0(Θ) ≡ J(Θ0) +∇J(Θ0)′ (Θ−Θ0) + 12(Θ−Θ0)′∇2J(Θ0)(Θ−Θ0),

where ∇ is the gradient operator, such that:

∇J(Θ0) =

(∂J

∂β0

(Θ0),∂J

∂β1

(Θ0),∂J

∂β2

(Θ0),∂J

∂β3

(Θ0),∂J

∂τ1

(Θ0),∂J

∂τ2

(Θ0)

)and ∇2J(Θ0) is the Hessian matrix of J(Θ0), that is the symmetric matrix containing the

second-order derivatives of J :

∇2J(Θ0) =

(∂2J

∂Θi∂Θj

(Θ0)

)i,j

, Θi,Θj ∈ (β0, β1, β2, β3, τ1, τ2). (21)

Quasi-Newton methods specify dk (the direction vector) as :

dk = −H−1k ∇J(Θk)

T, (22)

17

The step size αk is obtained by line minimization, where the direction Hk is a positive definite

matrix, Hk may be adjusted from one iteration to the next so that dk tends to approximate

the Newton direction.

Algorithm for BFGS method.Initialization. Choose initial guess Θ(0), initial positive definite Hessian guess H0

(usually the identity matrix), and stopping parameters δ and ε > 0.

Step 1. Solve Hkdk = −∇J(Θk) for the search direction.

Step 2. Solve αk = arg minα J(Θkk + αdk).

Step 3. Θk+1 = Θk + αkdk.

Step 4. Update Hk :

zk+1 = Θk+1 −Θk,

sk+1 = ∇J(Θk+1)−∇J(Θk),

uk = skHkzk,

vk =zkz′ksk

− Hkskuk

,

Hk+1 = Hk −zkz′k

z′ksk− Hksks

′kHk

s′kHksk

Step 5. If ||Θk −Θk+1|| < ε(1 + ||Θk||) go to step 6, else increment k go to step 1.

Step 6. If ||∇J(Θk)|| < δ(1 + |∇J(Θk)|) report success else report convergence to

suboptimal point. STOP.

4.3.2 Generalized Reduced Gradient Algorithm

The Generalized Reduced Gradient (GRG) algorithm (Lasdon et al., 1978) formulates a local

linearization of the nonlinear constraints and performs variable elimination. The optimization

problem is solved by line minimization along a direction obtained by the gradient of the

reduced objective function. GRG algorithms use the following the direction at iteration k

dk = P ∇J(Θk),

where P is the projection matrix defined as

P = QT2Q2 satisfying NTPw = 0,

18

where the columns of the Matrix N are the gradients of constraints, the matrix Q2 consists of

the last n− r rows of the Q factor in the QR factorization of N and w is an arbitrary vector.

In GRG, NT is partitioned as follows

NT = [N1 N2],

where N1 is the transpose of r linearly independent rows of N . Once we have identified N1

we can easily obtain Q2 which is given as

QT2 =

[−N−1

1 N2

I

]

Algorithm for GRG methodInitialization. Choose initial guess Θ(0), and stopping parameters δ and ε > 0.

Step 1. Compute dk = P ∇J(Θk),

Step 2. Solve αk = arg minα J(Θkk + αdk).

Step 3. Θk+1 = Θk + αkdk.

Step 4. If ||Θk −Θk+1|| < ε(1 + ||Θk||) go to step 5, else go to step 1.

Step 5. If ||∇J(Θk)|| < δ(1 + |∇J(Θk)|), report success, else report convergence to

suboptimal point. STOP.

5 Data and Computational Results

In this section, we present the properties of our dataset, performance measures for our algo-

rithms and details of our computational experiments.

5.1 Data

We have used weekly mid-price data for the fixed coupon paying dollar Eurobonds of Brazil,

Mexico and Turkey in the time period from July 2005 to June 2009, retrieved from Bloomberg

(2010). A Eurobond is a bond that is issued in a currency other than the currency of that

country. In our analysis, we focus on these three emerging market countries which have very

liquid Eurobond markets, due to their increasing popularity among investors. Note that,

liquidity of bonds is quite important to have a robust analysis. Characteristics of bonds are

collected from Reuters 3000 Extra (Reuters, 2010). We exclude all the bonds with special

19

characteristics (e.g. callable, puttable, structured, convertible, and Brady Bonds in order to

assure that a homogenous and reliable sample is used in our analysis.

5.2 Performance Measures

We have analyzed the performance measures of algorithms in four dimensions: goodness-of-fit,

computational time, space scanned and stability of parameters. Our first performance measure

is goodness-of-fit, for which we use the Mean Absolute Error (MAE) and the Root Mean Square

Error (RMSE). Clearly, the method that generates smaller errors is prefered. Secondly, we

calculate the CPU time of the algorithms because the algorithm that yields smaller errors

with less computational time would be more preferable. Thirdly, we approximate the amount

of parameter space scanned, because the algorithms that are able to scan more space will have

an obvious advantage to reach to global optimum. Finally, we present graphical illusturations

of NSS parameters evolution over the time, to observe their stability, our fourth measure.

5.2.1 Goodness-of-fit

The performance statistics MAE, and RMSE can be calculated as:

MAE =N∑i=1

|pi − pi|N

,

RMSE =

√√√√ N∑i=1

(pi − pi)2

N.

where N represents the number of bonds. RMSE places a greater weight upon larger errors

and therefore gives a greater indication as to how well the models fit the data at each particu-

lar observation. A low value for the mean is assumed to indicate that the model is flexible and,

on the average, is able to fit the yield curve fairly accurately. MAE is the average distance

between the theoretical bond prices and observed bond prices in absolute value terms. This

measure is not as easily influenced by extreme observations as RMSE. Therefore these two

measures are complementary.

5.2.2 Computation Time and Space Scanned

Computing time could be costly for a long time series bond data, for day-to-day estimation of

NSS parameters. Thus CPU time is computed for each optimization algorithm for the given

20

time period. Average distance to initial parameter set is also an important measure that

approximates the amount of space scanned. Scanning a large space around the initial choice

of parameters shows the quality for the algorithm, but it should be kept in mind that a large

distance result may occur due to some outlier values. For each day, average distance to initial

points is calculated as ||Θmin − Θ0|| where Θmin is the solution obtained in the optimization

process.

5.2.3 Stability

Past researchers have mainly focused on the performance of the yield curve fit and have

paid scant attention to parameter stability of NSS model. Knowing that NSS prameters are

estimated on a daily basis and each of these parameters have a specific financial interpretation,

any robust optimization algorithm is expected to generate smooth parametrs over the time.

Consider for instance, the first parameter of NSS model (β0), which is interpreted as the

long-run level of interest. From one day to the next, jumps of several percentage points for

the estimates of this parameter will be totaly unacceptable, even if the yield curve fit is quite

good. To this end, we depicted the parameters obtained by the NSS optimization procedures

together with the day-to-day values of the parameters, computed according to the study of

Diebold and Li (2006).

5.3 Computational Results

The algorithms are coded in MATLAB according to Press et al. (1992), except for the GRG

algorithm, for which we have used the Solver module that is bundled with Microsoft Excel

(Fylstra, 1998). To obtain the starting parameters, following the study of Diebold and Li

(2006) we have used least squares after linearizing the problem by fixing τ1 and τ2. For Nelder-

Mead, Powell, SA and PSO algorithms the constraints have been embedded in objective

function with a penalty constant C = 1000, whereas they have been explicitly stated for

BFGS. Stopping criterion for the algorithms has been chosen as ε = 10−8. Parameter set for

the Nelder-Mead algorithm is chosen as ρ = 1, χ = 2, γ = 0.5, σ = 0.5. For the SA algorithm,

initial value of T is chosen as 1.

Our experimentation showed that both PSO-W and PSO-G were able to generate high

quality results. However, the performance of PSO-G is not as robust as PSO-W, being too

dependent on the initial locations of the particles. PSO-W/G combines the best features of

these algorithms and returns the best results.

Goodness-of-fit statistics, that are given in Table 1, show that global optimization algo-

rithms clearly outperform gradient based and direct search algorithms. Among global opti-

21

mization algorithms, PSO-W/G algorithm achieves the smallest errors both in terms of RMSE

and MAE when compared to the other algorithms. This might be due the fact that global

optimization algorithms are better equipped to find the global minimum of the NSS function

in the (possible) presence of multiple local minima. As we discuss in section 4.1 the SA and

the PSO algorithms involve downhill moves in order to decrease the objective function, but

also allow random (possibly uphill) moves in order to escape from local minima. Table 1 also

shows that second to global optimization algorithms, direct search algorithms achieve smaller

errors when compared to gradient based ones. Nelder-Mead algorithm performs worse than

the Powell’s method in direct search algorithms. Among the gradient-based algorithms BFGS

algorithm yields larger errors compared to GRG algorithm. In Figure 2, we have also provided

sample yield curve fits for Brazil, which are estimated with the PSO-W/G algorithm. As it

can be observed from the figures, the NSS model is capable of replicating a variety of yield

curve shapes.

Figure 2: Selected fitted yield curves for Brazil.

Computation time results are heterogeneous among the classification of the optimization

algorithms. When all the algorithms are compared, Table 2 clearly shows that on average

direct search performs quite well and Nelder-Mead comes as the fastest algorithm among all.

This might be expected due to nature of the direct search algorithms. As it can be seen from

the steps that we define in section 4.2.1, Nelder-Mead algorithm applies simple operations to

22

the simplex (reflection, contraction, expansion) and also does not evaluate gradient informa-

tion, which makes it quite fast especially when compared to gradient based algorithms. In

terms of computaional speed, the PSO algorithms perform second to Nelder-Mead algorithm.

When Table 1 and Table 2 are considered together it can be observed that, although direct

search methods are quite fast, their goodness-of-fit measures are not as good as PSO algo-

rithms. Also as expected gradient based methods are more time consuming than the other

algorithms, since the gradient of the NSS function must be computed and a line minimization

procedure should be applied at each iteration.

Since the solution quality also depends on the initial starting point, in Table 2 we report

average distance of the solution from the initial point for each algorithm. It can be observed

that the direct search algorithms find a closer solution to the starting point compared to

gradient based and global optimization algorithms. This might be the very reason why direct

search methods produce larger errors and worse fit that we discussed at Table 1. Since

these algorithms exploit smaller space around the starting point, their convergence to true

solution is very sensitive to starting points of the parameters. We can also see why PSO-

W/G algorithm outperforms others in terms of goodness of fit, as it explores more space.

Interestingly, although Table 2 shows that the distance from the initial points for the BFGS

algorithm is much higher than the PSO algorithms, it produces poor goodness-of-fit results

which is due to its failure to converge on certain days. We would like to stress the facet that

our results are robust with respect to different starting points.

Total RMSE Total MAEMethod Brazil Mexico Turkey Brazil Mexico TurkeyGradient Based AlgorithmsBFGS 0.507 0.523 0.491 0.440 0.407 0.810GRG 0.239 0.234 0.248 0.281 0.146 0.187Direct Search AlgorithmsNelder-Mead 0.282 0.241 0.271 0.275 0.198 0.222Powell 0.234 0.198 0.236 0.282 0.232 0.220Global Optimization AlgorithmsPSO-W 0.214 0.179 0.204 0.166 0.145 0.153PSO-G 0.236 0.201 0.238 0.186 0.164 0.185PSO-W/G 0.212 0.178 0.203 0.165 0.143 0.149SA 0.224 0.215 0.230 0.171 0.171 0.178

Table 1: Comparison of optimization algorithms in terms of total RMSE and MAE

Graphical representation of level, slope and curvature factors (β0, β1, β2) of the best four

optimization algorithms for Brazil are depicted in Figures 3, 4, and, 5. Potential problem

of parameter instability for highly parameterized NSS functional form is obvious from the

23

CPU Time (seconds) Average DistanceMethod Brazil Mexico Turkey Brazil Mexico TurkeyGradient Based AlgorithmsBFGS 3156.2 2450.3 2747.9 2.842 7.112 5.210GRG 312.7 320.7 290.6 2.012 2.563 2.466Direct Search AlgorithmsNelder-Mead 6.5 5.6 6.4 1.732 1.950 1.916Powell 207.8 212.5 269.7 1.139 19.505 2.420Global Optimization AlgorithmsPSO-W 117.1 76.9 107.3 2.665 3.122 2.810PSO-G 106.4 68.8 97.5 3.065 3.248 2.588PSO-W/G 164.7 107.2 151.7 2.722 3.217 3.024SA 451.4 398.5 438.5 0.263 0.249 0.238

Table 2: Comparison of optimization algorithms in terms of CPU Time and average distanceto initial points

figures. Deviation from empirical proxies (RMSE’s) for three factors are given in table 3. The

optimization algorithms tested differ greatly in their degree of parsimony.

As it can be observed from these figures and tables, global optimization algorithms are

capable of generating smoother and more realistic level, slope, and curvature factors when

compared to gradient based and direct search algorithms. Among global optimization algo-

rithms, SA produces similar smooth paremeters similar to the PSO-W & PSO-G algorithm

for level and slope factors. However, PSO-W & PSO-G algorithm outperforms SA algorithm

in terms of stability of curvature parameters (β2). This might be due to some of the attractive

characteristics of PSO which has memory, so knowledge of good solutions is retained by all the

particles. As discussed in section 4.1, in our PSO setting, yield curve initially has a population

of random solutions. Each potential solution, called a particle, is given a random velocity and

is flown through the NSS parameter space. Each particle has memory and keeps track of its

previous best position and its corresponding fitness for the yield curve. Although this leads

to a number of local best for the respective particles in the swarm and the one with greatest

fitness to yield curve (the global best of the swarm) is assigned to a NSS parameter. This

constructive cooperation between particles; that is, particles in the swarm share information

among themselves lead to stable parameters for yield curve. This paper demonstrates that

PSO algorithm can differ markedly from other optimization algorithms in terms of smooth-

ness of NSS parameters. Between the gradient based methods, GRG produces a quite smooth

level (β0) factor, however slope and curvature parameters are quite instable. BFGS algo-

rithm produces extremely instable results for all three parameters. Similarly, among direct

24

search algorithms, Powell’s algorithm produces erratic results for all three parameters whereas

Nelder-Mead produces relatively smoother NSS parameters. Finally, we would like to note

that our conclusions hold for Mexico and Turkey Eurobond portfolios as can be seen in table

3.

β0 β1 β2

Method Brazil Mexico Turkey Brazil Mexico Turkey Brazil Mexico TurkeyGradient Based AlgorithmsBFGS 2.596 69.609 11.593 76.137 76.491 46.624 66.093 114.659 73.855GRG 0.283 1.529 0.252 2.309 3.050 3.877 849.774 5.379 220.904Direct Search AlgorithmsNelder-Mead 1.120 1.263 0.818 2.414 2.491 2.629 30.748 34.618 35.094Powell 3.108 6.235 3.559 5.656 7.785 43.478 122.262 11312.76 160.978Global Optimization AlgorithmsPSO-W 1.037 1.117 0.879 2.398 1.795 2.035 4.450 5.178 6.261PSO-G 1.143 1.061 0.863 1.746 1.708 1.628 4.258 4.254 4.290PSO-W & PSO-G 0.949 1.110 0.911 1.676 1.705 1.682 4.232 4.104 4.341SA 0.915 0.455 0.403 1.261 1.565 1.200 6.461 3.883 4.755

Table 3: Deviation of three factors from empirical proxies

Figure 3: Evolution for level (β0) in Brazil

25

Figure 4: Evolution for slope (β1) in Brazil

6 Robustness

In this section, we tested the robustness of the PSO-W & PSO-G algorithm on NSS estimation

along several dimensions. First, we accounted for the perturbation of bond prices by adding a

random term to each bond price as a uniform random value in the interval of bid-ask spread.

Second, we compared in-sample and out-of sample results. Third, we tested the sensitivity

of the optimization algorithms to the initial values, by randomly selecting the initial values.

Finally, we compared the emerging market results with U.S. bond results. The qualitative re-

sults remain unchanged, and for the sake of brevity, we omitted relevant data from the paper.6

Perturbation of Bond Prices: We applied the fitted-price error formula discussed in Bliss

(1997) to perturb the estimates within the bid-ask bound, in which the error is defined as

follows:

ε =

p− pAsk if p > pAsk

pBid − p if p < pBid

0 otherwise

(23)

6The results can be obtained from the authors.

26

Figure 5: Evolution for curvature (β2) in Brazil

The results show that when the bond price data is perturbed randomly in the bid-ask inter-

val, the algorithm shows similar results in terms of goodness-of-fit, parameter space scan, and

stability of parameters.

In-Sample and Out-of-Sample Results: We randomly chose one half of the bonds for each day

in the time period and fit the NSS curve to the data. Afterwards, we applied the algorithm

to the other half of the bond data for each day in three emerging market eurobond data. The

change in performance measures from in-sample to out-of-sample tests reveals the reliability

and robustness of the in-sample performance results. It can be observed from the experiments

that in-sample and out-of-sample fitting results show similar characteristics.

Sensitivity to Initial Values: Another robustness measure is the sensitivity of the optimization

algorithm for the changes in the initial values. We randomly changed initial value of each NSS

parameter in the appropriate intervals. We then applied each optimization to NSS model for

the eurobond data of three emerging markets. We observed from the results that the algo-

rithms other than PSO-W & PSO-G could not find a solution for certain initial parameter

sets, whereas the PSO-W & PSO-G algorithm converges to a solution for each random pa-

rameter set.

27

Developed vs Emerging Markets: We applied the algorithm on the term structure estimation

of treasury bonds of U.S. The data, obtained from the CRSP Government Bond files, consists

of daily quotes for treasury issues from July 2005 to June 2009. The results of the optimization

algorithms are similar to that of emerging markets, in terms of goodness-of-fit, computational

time, space scanned, and stability of estimated parameters.

7 Conclusion

In this study, we successfully applied a variant of the PSO algorithm to solve the optimization

problems associated with the estimation of the parameters of the NSS yield curve model. We

compared the PSO method with some well known non-linear optimization algorithms from the

literature. We performed a computational comparison among these algorithms and estimated

the day-to-day parameters of NSS model on the liquid bond portfolios of Turkey, Brazil, and

Mexico for a five year period of time. We have applied four perfomance measures, goodness-

of-fit, computational time, space scanned, and stability of estimated parameters to evaluate

the algorithms. The PSO-W & PSO-G algorithm is observed to improve goodness of fit and

the computational time as well as to generate a stable, economically interpretable paramater

set. Our computational experiments also show that the PSO-W & PSO-G algorithm is quite

robust with respect to the choice of initial starting values, as it explores a larger search space.

Overall, we would like to point out that the PSO-W & PSO-G algorithm can provide a robust

framework to solve the numerical problems reported in the estimation of NSS parameters.

28

References

Abd-El-Wahed, W., Mousa, A., and El-Shorbagy, M. (2011). Integrating particle swarm

optimization with genetic algorithms for solving nonlinear optimization problems. Journal

of Computational and Applied Mathematics, 235(5):1446 – 1453.

Anderson, N. and Murphy, G. (1996). Estimating and Interpreting the Yield Curve. John

Wiley and Sons, New york.

Bliss, R. (1997). Testing term structure estimation methods. Advances in Futures and Options

Research, 9:97–231.

Bliss, R. and Ritchken, P. (1996). Empirical tests of two state-variable heath-jarrow-morton

models. Journal of Money, Credit and Banking, 28(3):452–476.

Bloomberg (2010). Bloomberg. http://www.bloomberg.com.

Bolder, D. and Streliski, D. (1999). Yield curve modelling at the bank of canada. Technical

Report 84, Bank of Canada.

Clerc, M. and Kennedy, J. (2002). The particle swarm - explosion, stability, and convergence

in a multidimensional complex space. IEEE Transactions on Evolutionary Computation,

pages 58–73.

Cox, J., Ingerssol, J., and Ross, S. (1985). A theory of term structure of interest rates.

Econometrica, 53(2):385–407.

Csajbok, A. (1998). Zero-coupon yield curve estimation from a central bank perspective. MNB

Working Papers.

De Pooter, M. (2007). Examining the nelson-siegel class of term structure models: In-sample

fit versus out-of-sample forecasting performance. Federal Reserve Board.

Diebold, F. and Li, C. (2006). Forecasting the term structure of government bond yields.

Journal of Econometrics, 130:337–364.

Fylstra, I. (1998). Design and use of the microsoft excel solver. Interfaces, pages 29–55.

Gurkaynak, R., Sack, B., and Wright, J. (2007). The u.s. treasury yield curve: 1961 to the

present. Journal of Monetary Economics, pages 2291–2304.

Heath, D., Jarrow, R., and A., M. (1992). Bond pricing and the term structure of interest

rates: A new methodology for contingent claims. Econometrica, 60(1):77–105.

29

Ho, T. and Lee, S. (1986). Term structure movements and pricing interest rate contingent

claims. Journal of Finance, 41(5):1011–1029.

Ioannides, M. (2003). A comparison of yield curve estimation techniques using uk data.

Journal of Banking and Finance, 27:1–26.

Judd, K. (1998). Numerical Methods in Economics. MIT Press, Cambridge, MA.

Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983). Optimization by simulated annealing.

Science, 220:671–680.

Lagarias, J., Reeds, J., Wright, M., and Wright, P. (1965). Convergence properties of the

nelder-mead simplex method in low dimensions. SIAM Journal on Optimization, 9:112–

147.

Lasdon, L., Waren, A., Jain, A., and Ratner, M. (1978). Design and testing of a generalized

reduced gradient code for nonlinear programming. ACM Transactions on Mathematical

Software, 4:34–50.

Manousopoulos, P. and Michalopoulosa, M. (2007). Comparison of non-linear optimization

algorithms for yield curve estimation. European Journal of Operational Research, 192:594–

602.

Martellini, L., Priaulet, P., and S., P. (2003). Fixed Income Securities : Valuation, Risk

Management and Portfolio Strategies. John Wiley and Sons, Newyork.

McCulloch, J. H. (1971). Measuring the term structure of interest rates. The Journal of

Business, 44:19–31.

Nelder, J. and Mead, R. (1965). A simplex method for function minimization. Computer

Journal, 7:308–313.

Nelson, C. and Siegel, A. (1987). Parsimonious modelling of yield curves. The Journal of

Business, 60:473–489.

Pedersen, M. and A., C. (2010). Simplifying particle swarm optimization. The Journal of

Applied Soft Computing, pages 618–628.

Powell, M. (1964). An efficient method for finding the minimum of a function of several

variables without calculating derivatives. Computer Journal, 7:155–162.

Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. (1992). Numerical recipes in C.

The Art of Scientific Computing. Cambridge University Press.

30

Reuters (2010). Reuters. http://www.reuters.com.

Seppala, J. and Viertio, P. (1996). The term structure of interest rates: estimation and

interpretation. Technical Report 19, Bank of Finland.

Shea, G. S. (1985). Interest rate term structure estimation with exponential splines: A note.

Journal of Finance, 40:319–325.

Shi, Y. and Eberhart, R. (1998). A modified particle swarm optimizer. Evolutionary Compu-

tation Proceedings, IEEE World Congress on Computational Intelligence, pages 69–73.

Svensson, L. (1994). Estimating and interpreting forward interest rates: Sweden 1992–1994.

National Bureau of Economic Research, (4871).

Trelea, I. (2003). The particle swarm optimization algorithm: convergence analysis and pa-

rameter selection. Information Processing Letters 85, pages 317–325.

Vasicek, O. (1977). An equilibrium characterization of the term structure. Journal of Financial

Economics, 5(2):177–188.

Yang, X., Yuan, J., Yuan, J., and Mao, H. (2007). A modified particle swarm optimizer with

dynamic adaptation. Applied Mathematics and Computation, 189(2):1205 – 1213.

31

robust term structure estimation - i-rml.org term structure estimation... · robust term structure...

Documents