robust term structure estimation - i-rml.org term structure estimation... · robust term structure...
TRANSCRIPT
Robust Term Structure Estimation
Robert Bliss ∗ Emrah Sener † Gunes Erdogan ‡ Emrah Ahi §
November 30, 2011
Abstract
Despite important advances in interest rate curve modeling in the last 30 years,
little attention has been paid to the key practical problem of robust estimation of the
associated parameters. In order to address this issue, we propose a hybrid Particle
Swarm Optimization (PSO) method to estimate the Nelson-Siegel-Svensson (NSS) yield
curve. We compare the results to that of some commonly used optimization algorithms,
each applied to bond portfolios of emerging markets (i.e. Brazil, Mexico and Turkey).
Our results show that the proposed hybrid PSO algorithm not only allows for a robust
estimation of the NSS parameters, but also provides a better yield curve fit in a short
period of time, and in turn, generates an economically interpretable parameter set.
Keywords: Term structure, yield curve, Nelson-Siegel-Svensson, Particle Swarm Opti-
mization
∗[email protected], Robert Bliss is Professor and F.M. Kirby Chair in Business Excellence, Wake ForestUniversity†[email protected], Emrah Sener is Director of Center for Computational Finance, Ozyegin
University‡[email protected], Gunes Erdogan is a Lecturer at School of Management, Southampton University§[email protected], Emrah Ahi is a PhD Student , Ozyegin University
1
1 Introduction
The literature is rich when it comes to various models for yield curves, whereas little attention
has been paid to the robust estimation of the associated model parameters. To address
this particular issue, we propose a hybrid Particle Swarm Optimization (PSO) method, and
estimate the Nelson-Siegel-Svensson (NSS) yield curve for emerging markets; Brazil, Mexico,
and Turkey.
Yield curve (the relationship between interest rates and term to maturity) is a fundemental
concept to not only economic and finance theory but also the pricing and risk management
of interest rate contingent claims. It has proved to be very critical to policy makers and to
market practitioners. Unfortunately, rarely in financial economics is the contrast between
theory and reality more severly visible than the study of the yield curve estimation. Slight
differences in the estimated yield curves may result in significant differences when pricing
bonds or fixed income portfolios. Therefore, the estimation problem should be thoroughly
examined despite its apparent simplicity. However, little attention has been paid to resolving
confliting parameter estimates from the various optimization algortihms. This paper seeks to
address this gap by proposing the use of a hybrid PSO method to estimate the parameters of
the Nelson-Siegel- Svensson exponential forward rate models. Our computational experiments
show that PSO outperforms the traditional methods in terms of goodness-of-fit, and provides
a stable set of parameters. PSO is also quite robust to the choice of initial starting values.
There are two distinct approaches to modelling the term structure of interest rates.1 The
first approach is based on curve-fitting techniques — a direct specification of the bond prices
as a function of some parameters and the time to maturity (the cross-sectional dimension).
The second approach is based on models which make explicit assumptions about the dynamics
of state variables (the time-series dimension) and asset pricing methods using either equilib-
rium or arbitrage arguments, which in turn result in cross-sectional models for bond prices.2
Furthermore, the first-type of models can be classified into two groups: (i) spline-based
models and (ii) function-based models. The choice between them will be dictated by the
trade-off between the closeness of fit to the set of observed government coupon prices and the
smoothness of the corresponding zero-coupon rate function.
Within the class of the function based models Nelson and Siegel (1987) first used an
exponential polynomial functional form for the forward rate curve. Svensson (1994) extended
1For an excellent comparison of yield curve estimation techniques, see Ioannides (2003). Further discussionon the yield curve can be found in Anderson and Murphy (1996).
2Within the second approach interest rate models based on short rate models either imply a strcucture forthe shape of the term structure such as Cox et al. (1985) or Vasicek (1977) or take the entire term structureas given and provide structure for its evolution such as Ho and Lee (1986) and Heath et al. (1992).
2
Nelson and Siegel’s function by adding a fourth term with two additional parameters. The
addition of this fourth parameter aimed at increasing the flexibility and improving the fit.
From a technical point of view, the extension creates a second hump-shape (or U-shape). For
a number of reasons this approach has become one of the most popular yield curve models.
Firstly, a number of studies such as Bliss and Ritchken (1996); Bliss (1997) and Seppala and
Viertio (1996) conclude that practitioners requiring a reliable and parsimonious representation
of the yield curve should use an exponential polynomial approach, in preference to the spline-
based approaches. The latter results in the over-fitting problem described in Bliss (1997): ”...If
the term structure being used is ”too flexible” the model will almost certainly be incorporating
unwanted measurement error or idiosyncratic, bond-specific factors of no relevance to pricing
securities in other markets...” It has been shown that fitting the zero-coupon yield curve
is better than fitting the discount function as this will eliminate coupon effects that exist
in the latter case. Secondly, since asymptotically flat exponential approximating functions
are used, the resulting forward rate function is infinitely differentiable.3 Thirdly, financial
economists have argued that the functional form chosen for the forward rate curve of Nelson
and Siegel (1987) is effectively a dynamic three-factor model of level, slope, and curvature
(Diebold and Li, 2006). In this respect, since it requires fewer parameters, it allows for a
clearer interpretation of the parameters.
The NSS models are popular among central bankers and practitioners. They don’t give
only a good estimate of the yield curve but also provide estimates for the parameters which
are interpreted as level, slope, and curvature. In creating bond portfolios such as butterfly and
ladder, these parameters are invaluable for practitioners hedging the risk of bond portfolios.
Also, parametric form of bond relative value models are becoming increasingly popular in the
investment community because they are more suitable for cheap/rich analysis since this class
of models focuses on the actual cash flows of the underlying securities rather than using the
yield-to-maturity measure, which is subject to a number of shortcomings (Martellini et al.,
2003).
However, estimating the parameters of the NSS models is a difficult task because of the
non-linear nature of the yield curve that expose itself in complex shapes where there appear to
be multiple local minima (or maxima) in addition to a global minimum (or global maximum).
A number of studies (Bolder and Streliski, 1999; Gurkaynak et al., 2007; De Pooter, 2007),
3The Nelson and Siegel (1987) and Svensson (1994) model imply that forward rates smoothly gravitatetowards a flattened long end whereas the McCulloch (1971) yield curve allows forward rates to fluctuate withmaturity and rise steeply as the term to maturity lengthens. But what if the forward rates reflect expectedfuture rates? In other words, the NSS models, by constraining the implied forward rate curve, are implicitlypromoting a more stable yield curve with reference to how the ’true’ term structure is expected to behave.The main question is at what cost in terms of accuracy of the fit is this stability achieved. For this discussion,see Shea (1985).
3
have reported numerical difficulties when working with the NSS model. The traditional
methods of direct search, gradient based, and quasi-Newton algorithms, which may be used
for solving the associated optimization problem, carry the risk of numerical problems of false
convergence and severe suboptimality (Bolder and Streliski, 1999). As explained in Bolder
and Streliski (1999), using the standard estimation methods to obtain the global optimum,
it is necessary to estimate the model for many different sets of starting values for the model
parameters. For example, estimating the six parameters of the Svensson model, with five
different starting values for each parameter requires a grid size of 56 = 15, 625 — all possible
combinations of five different starting parameter values for each of the six parameters. The
time required to increase the robustness of an estimated curve, or the confidence of having a
global optimum, increases exponentially with the number of different starting values chosen
for each parameter. Time-wise instability of the NSS parameters due to the existence of
multiple local minima (known as false convergence) in the associated optimization problem
is one such issue that arises frequently. These numerical problems may also arise from the
high sensitivity of the optimization algorithm to the initial starting values which can cause
great variability in the estimated parameters. Since NSS parameters are estimated on a daily
basis and these parameters have economic meaning, wild variations in these parameters will
be strange. However, little attention has been given to algorithms used in the estimation of
the parameters of the NSS.
In the original work of Nelson and Siegel (1987), it is recommended to solve the optimiza-
tion problem over an empirically predetermined set of values of the parameters causing nonlin-
earity. Ronn (1987) uses linear programming, Csajbok (1998) uses Gauss-Newton algorithm,
whereas Bolder and Streliski (1999) adopts the use of sequential quadratic programming as
well as Nelder-Mead Simplex Method to estimate the NSS parameters. Ioannides (2003) has
compared various yield curve models using the BFGS quasi-Newton algorithm. Diebold and
Li (2006) have tackled the estimation problem by fixing the parameters causing nonlinearity,
and solving the resulting linear subproblem by least squares. Recently, Manousopoulos and
Michalopoulosa (2007) have compared a diversity of non-linear optimization algorithms for
estimating the yield curve for the Greek bond market and suggested a two-step optimization
process to estimate the NSS model parameters: first to estimate the NSS parameters with di-
rect search and global optimization algorithms and then refine the estimated paremeters with
gradient based methods. In this study, we propose a hybrid Particle Swarm Optimization
(PSO) method for estimating the parameters of the NSS yield curve model.
Four performance criteria are used for evaluating and comparing the optimization algo-
rithms tested. First is goodness of fit which shows how well the suggested algorithm fits
empirically observed yield curve. Since NSS parameters need to be calculated on a daily basis
4
and this computational time can be quite expensive for long time series, we use computational
time as a second performance measure. Thirdly, we investigate the parameter space scanned
by each algorithm around the starting point. An algorithm that scans a large space is prefered
because the algorithms that exploit smaller space around the starting point are very sensitive
to starting points of the parameters. Finally, and most importantly, we check the parameter
stability over the time. Past researches have mainly focused on the performance of the yield
curve fit and have paid scant attention to parameter stability of NSS model. Knowing that
NSS parameters have a specific financial interpretation, any robust optimization algorithm
is expected to generate smooth parameters over the time. Our computational comparisons
with the existing methods show that the PSO algorithm can provide a robust framework
that can handle the numerical problems reported in the estimation of NSS parameters. PSO
yields smaller errors while fitting the yield curve with low computational time and smoother
parameters as well as increasing the stability of the estimated parameters. Also, in contrast
to traditional optimization algorithms employed in the finance literature, PSO algorithm has
the advantage of not being dependent on the initial values, which minimizes the risk of non-
convergence.
The rest of the paper is organized as follows: In Section 2, we discuss the framework of
the NSS model. Section 3 describes the estimation procedure and the related optimization
algorithm. Section 4 provides the description of the optimization algorithms we use in our
computational experiments. In Section 5, we give the details of our computational experiments
and results. Our concluding remarks are provided in Section 6.
2 Yield Curves
The yield curve is not directly observable in the market and it needs to be estimated from the
bond prices. In this section we first discuss the basic yield curve concepts, then we show an
important relationship between spot rate and forward rates which is the underlying rationale
of the NSS model.
2.1 Yield curve concepts
Consider a bond which is sold now, at time t, is due to mature at time T . Let t ≤ T < T ,
where T is the trading horizon, much longer than the maturity date of any bond. If the bond
is a zero-coupon bond with maturity M = T − t, its price at time t, p(t, T ), is calculated using
the yield to maturity y(t, T ) as
p(t, T ) = e−y(t,T )(T−t),
5
with the properties, p(T, T ) = 1, p(t, T ) < 1 for t < T.
It is trivial to show that the bond price can be transformed to give the yield curve,
y(t, T ) = − ln(p(t, T ))/(T − t).Since many governments do not issue longer term zero-coupon bonds, constructing yield
curves from the market data requires coupon bond price (or yield). A coupon bond may be
priced in a number of ways. The traditional procedure is to discount all cash flows of the
bond by, y(t, T ).
p(t, T ) =T−t∑m=1
ct+m d(t, T ) =T−t∑m=1
ct+me−y(t,T )t,
where p(t, T ) is the price of a T period bond when the yield to maturity is y(t, T ), ct+m is
the coupon payment at time t+m with cT = F being as the bond’s face value and d(t, T ) is
the discount factor that gives the price of default-free zero-coupon bond that pays one unit
at time T .
A coupon bond is, in effect, a bundle of zero-coupon bonds with each coupon payment
constituting a single zero-coupon bond. Consequently, if the prices of discount bonds maturing
at each coupon date is known, then it is easy to find the price of coupon paying bond using
the constituent zero-coupon rates:
p(t, T ) =T∑m=t
cm d(t, T ) =T∑m=t
cme[−r(t,m)(m−t)], (1)
where r(t,m) is the spot rate applicable to a term of m periods. The yield curve at a given
date t is represented by a graph of the spot rate r(t,m) for different times to maturity m− t.The discount function4 can easily be transformed to be presented as a forward rate. More-
over, the transformation between the spot interest rate and the forward rate is unique for a
given yield curve. Let f(t,m, T ) be the continuously compounded forward rate on a forward
contract concluded at time t (the trade date), for an investment that starts at time m > t
(the settlement date), and ends at time T > m (the maturity date). Then the forward rate is
related to the spot rates according to period forward rate:
f(t,m, T ) =(T − t)r(t, T )− (m− t)r(t,m)
T −m. (2)
This forward rate Eq.(2) can easily be amended to produce an instantaneous forward rate by
4The Fundemantal Theorem of Asset pricing (see Dybvig and Ross (1989)) states that absence of arbitrageimplies the existance of a linear pricing rule which corresponds to unique ’discount function’ in the termstructure literature. However as Dermondy and Prisman (1988) have argued because bond prices are quotedwith a spread PBid < CDisc < Pask multiple discount function is possible.
6
taking periods m and m− 1 infinitesimally close together
f(t,m) = limh→0
f(t,m,m+ h) = r(t,m)− (m− t)∂r(t,m+ h)
∂m
Integrating this function results in the spot rate:
r(t, T ) =1
T − t
∫ T
t
f(t,m)dm (3)
which implies that the spot rate is an average of forward rates. This relationship between
spot rate and forward rate lies at the heart of the NSS model.
2.2 Nelson-Siegel and Svensson models
Nelson and Siegel (1987) assume that the instantaneous forward rate is the solution of a
second-order differential equation with two equal roots. Nelson and Siegel’s forward rate
function, for some fixed time t (i.e. cross-sectional) is given by
f(m,Θ) = β0 + β1e
(mτ1
)+ β2
[(m
τ1
)e
(mτ1
)](4)
where Θ = [β0, β1, β2, τ1] are parameters to be estimated. This function can be divided
into two functional components: (i) a simple exponential function, g(m) = β0 + β1e
(mτ1
)for
β0, β1, τ1 ∈ R with β0 > 0 and (ii) a hump-shaped function, h(m) = β2
[(mτ1
)e
(mτ1
)]for
β2, τ1 ∈ R with τ1 > 0.
The β0 parameter anchors g at a given level, while the sign of β1 determines the slope of the
instantaneous forward curve. The parameter β1 generally takes a negative value, producing
an upward-sloping shape for the forward rate curve. A large (small) value of τ1 means that
this exponential effect decays more slowly (quickly). The function h(t) adds some additional
flexibility to permit the instantaneous forward-rate curve to take a number of different shapes.
This component creates a hump shape when β2 is positive, while it generates a U-shape when
negative, as can be seen in figure 1. The parameter τ1 controls the speed of convergence of
the third term in Eq.(4), as does τ1 for the second component. Finally, an important feature
of Eq. (4) is that the limits of forward and spot rates when maturity approaches zero and
infinity, respectively, are equal to β0 + β1 and β0. The parameters of this model have clear
financial meanings: the long term forward rate is represented by β0 (level), the short term
rate is modulated by β1 (slope), and medium term rate is governed by β2 (curvature).
7
Svensson (1994) extends the work of Nelson and Siegel (1987) by repeating the function
h(m) twice, with different parameters, and linearly combining the functions g(m) and h(m)
into a single function for the instantaneous forward-rate curve:
f(m,Θ) = β0 + β1e
(mτ1
)+ β2
[(m
τ1
)e
(mτ1
)]+ β3
[(m
τ2
)e
(mτ2
)](5)
where Θ = [β0, β1, β2, β3, τ1, τ2] are parameters to be estimated. The term β3
[(mτ2
)e
(mτ2
)]for
β3, τ2 ∈ R with τ2 > 0 now gives rise to a second hump-shape (or U-shape), increasing the
flexibility and improving the fit. Figure 1 shows the decomposition of the NSS curve into its
exponential components.
Figure 1: NSS Components
3 Estimation and Optimization Problem
We now focus on the problem of determining the parameters of the NSS model. First we need
to transform Eq. (5) forward-rate curve into a discount function to price the set of coupon
bonds. Once we have a theoretical price vector, we can optimize over the parameter set to
minimize the distance between the observed bond prices (obtained from the market) and the
theoretical bond prices (estimated using the yield curve).
8
3.1 Estimation
To transform f(m,Θ) into a discount function, we use the result from Eq. (3). In other
words, the spot rate required in the discount function, Eq. (1) can be derived by integrating
the forward rate according to Eq. (3):
r(m,Θ) = β0 + β11− e(−m/τ1)
m/τ1
+ β2
(1− e(−m/τ1)
m/τ1
− e(−m/τ1)
)+ β3
(1− e(−m/τ2)
m/τ2
− e(−m/τ2)
)(6)
where Θ is the set of the model parameters [β0, β1, β2, β3, τ1, τ2] and m is the time to maturity.
Consequently, the present value relation is specified using the continuous form of the discount
function given by:
d(m,Θ) = e−r(m,Θ)m.
Consider a set of N bonds. Let pi(t,m) be the estimated price of bond i at time t, d(mij,Θ)
be the discount factor that gives the price of zero-coupon bond that pays one unit at time
m, cij be its jth payment, paid at time mij (i.e., cij = ci(mij)), and mi be the number of
payments. Then, Eq. (1) is rewritten for bond i as
pi(m,Θ) =
mi∑j=1
cijd(mij,Θ) (7)
.
To handle bond-specific errors due to market frictions (such as tax effects and liquidity
problems) an error term is added to Eq. (7) and the resulting expression is estimated as
cross-sectional regression with all the bonds outstanding at a particular date:
pi = pi + ui =
mi∑j=1
cijd(mij,Θ) + ui i ∈ {1, ..., N} (8)
The regressors are coupon payments at different dates, and the coefficients are the discount
zero-coupon bond prices d(mij,Θ), j ∈ {1, ...,M}, where M = maximi is the longest coupon
bond maturity.
9
3.2 Optimization Problem
The parameter vector Θ must be chosen to minimize the weighted sum of squared price errors
ui = pi − pi:
minΘJ(Θ) = min
Θ
[N∑i=1
(wiui)2
](9)
where J(Θ) is the objective function, wi is the appropriate weighting scheme:
wi = (1/Di)/
[M∑m=1
(1/Dm)
],
where Di is the duration of the bond.5
Note that in this case the squared errors are weighted with the inverse of the bond’s
duration Di. Parameters estimated under such loss functions are quadratic in nature, and are
sensitive to outliers. To reduce the impact of outliers on the parameter estimates, we specify a
robust loss function by weighting each price error by the inverse of its duration. This approach,
suggested by Bolder and Streliski (1999), downweights large errors in the objective function.
With this definition of the associated optimization problem, the estimation procedure
followed can be described as follows.
5Nelson and Schaefer (1983) have tested various duration based models and found that duration capturesthe most of the interest rate risk and suggested to use duration weighting rather than maturity weighting.
10
Estimation ProcedureThe steps followed in the estimation procedure are as follows:
1. Starting parameters ( Θ(0) = (β(0)0 , β
(0)1 , β
(0)2 , β
(0)3 , τ
(0)1 , τ
(0)2 ) ) are computed via
least squares as suggested by Diebold and Li (2006).
2. The discount factor function is determined using the starting parameters Θ(0),
i.e., d(m,Θ(0)) = e[−r(m,Θ(0))m].
3. The discount function is utilized to determine the present value of the bond
cash flows and thereby to determine a vector of starting ’model’ bond prices,
i.e., p(0)i =
∑mij=1 cijd(mij,Θ
(0)) i ∈ {1, ..., N} .
4. Numerical optimization procedures are used forestimating a set of parameters
that minimize the specified loss function,
i.e., minΘ J(Θ) with the constraints
c1(β0) = −β0 ≤ 0 (10)
c2(β0, β1) = −(β0 + β1) ≤ 0 (11)
c3(τ1) = −τ1 ≤ 0 (12)
c4(τ2) = −τ2 ≤ 0 (13)
5. The estimated set of parameters is used to determine the spot rate function
and therefore the ‘model’ prices.
The objective function J(Θ) is non-linear and has been observed to have multiple local
minima (Manousopoulos and Michalopoulosa, 2007). Thus, we need a robust algorithm that
will converge to a high quality solution, regardless of the initial solution values. In the next
section we describe well-known optimization algorithms from the literature as well as the PSO,
which we will utilize to solve the minimization problem stated above.
4 Optimization Algorithms
To comparatively evaluate the performance of the PSO algorithm, we implemented well-known
numerical optimization algorithms from the literature. Our focus in this paper is not on which
optimization algorithm ’wins the horse’ but on how one could solve the numerical problems
reported in the literature. This paper demonstrates that the methodological differences among
11
the algorithms may well lead to errors in the fit of the yield curve and instability of param-
eters estimated that are material. Therefore the selected algorithms is classified into three
categories: global optimization algorithms, direct search algorithms and gradient based al-
gorithms. We discuss all these algorithms in detail to discuss why PSO algorithm is better
suited to solve the numerical difficulties reported in NSS model.
4.1 Global Optimization Algorithms
The objective of global optimization is to find the globally best solution of (possibly non-linear)
models, in the presence of multiple local optima.
4.1.1 Particle Swarm Optimization
PSO is a population-based metaheuristic technique where the gradient of the objective func-
tion is not required. It is a fast converging algorithm, is easy to implement and recently has
been successfully applied to optimizing various continuous nonlinear functions in practice.
(Clerc and Kennedy, 2002; Trelea, 2003; Pedersen and A., 2010; Yang et al., 2007; Abd-El-
Wahed et al., 2011) PSO is essentially inspired from the social behaviour of the individuals.
In a simple social setting, decision process of each individual (particle) is affected by his own
experiences and other individual’s experiences. From their own experiences they know how
good each available choice they have tried so far. From other individual’s experiences not only
they know which choices are most positive but also they know how positive the best pattern
of choices was. In our particle swarm optimization setting, a set of particles search for good
solutions to a given NSS continuous optimization problem that we described in section 3.2.
Each particle is a solution of the NSS optimization problem and uses its own experience and
also the experience of neighbour particles to choose how to move in the search space to find
a better yield curve fit.
The PSO algorithm is initialized with a population of random candidate solutions, called
particles. Each particle is assigned a random location and a random velocity, and is iteratively
moved through the problem space. Every particle is attracted towards the location of the best
solution achieved by the particle itself and towards the location of the best solution achieved
across the whole population. At each iteration velocity vector (vi) and position vector (Θi) of
a particle is updated as follows:
vi ← vi + U(0, φ1)⊗ (pbi −Θi) + U(0, φ2)⊗ (gb−Θi) (14)
Θi ← Θi + vi (15)
12
where U(0, φi) is a vector of uniformly distributed random numbers in the interval [0, φi],
⊗ is the entry-wise product, pbi is the best known position of particle i, and gb is the best
known position of entire population. The parameters φ1 and φ2 denote the magnitude of
the random forces in the direction of personal best pbi and swarm best gb. The components
U(0, φ1)× (pbi−Θi) and U(0, φ2)× (gb−Θi) can be interpreted as attractive forces produced
by springs of random stiffness. There are certain variants of the original PSO algorithm, which
have been reported to perform better on a number of problems. As one such variant we call
PSO-W, the following update was proposed by Shi and Eberhart (1998):
vi ← ωvi + U(0, φ1)⊗ (pbi −Θi) + U(0, φ2)⊗ (gb−Θi) (16)
Θi ← Θi + vi (17)
ω ← ωp × ω (18)
where ω is termed as inertia weight and 0 < ωp < 1 is the decay factor. Effectively, the inertia
weight supplies a random search direction, the magnitude of which decreases over iterations.
Another variant of PSO was proposed by Pedersen and A. (2010). This variant, named as
PSO-G, disregards the personal best values and focuses on the neighborhood of the population
best. PSO-G has been reported to perform better on a number of problems (Pedersen and A.,
2010). Our computational experiments show that the best results were obtained when PSO-W
and PSO-G were applied in conjunction. This method will be referred to as PSO-W/G for
the rest of the study.
PSO has been designed an unconstrained optimization algorithm, which required us to
handle the constraints in the objective function. To this end we added a penalty function that
is a scalar value times the square of the violation of constraint ci to the objective i.e
Pi(Θ) =
0 , ci(Θ) ≤ 0 ,
C(ci(Θ))2, ci(Θ) > 0(19)
where C is large scalar value. Thus, the optimization problem becomes:
minΘf(Θ) = J(Θ) +
4∑k=1
Pk(Θ)) (20)
13
Hybrid Particle Swarm Optimization AlgorithmInitialization. Initialize every particle i s.t. Θ
(i)0 = U(0, φi) and v
(i)0 = U(0, φi), as
well as ωp, kg, kmax and stopping criteria ε.
Step 1. Update Θ(i)k and v
(i)k :
Θ(i)k ← Θ
(i)k−1 + v
(i)k−1
ω ← ωp × ω
If k < kg then the algorithm is PSO-W,
v(i)k ← ωv
(i)k−1 + (U(0, φ1)× (pbi −Θi)) + U(0, φ2)× (gb−Θi)
If k > kg then the algorithm is PSO-G,
v(i)k ← ωv
(i)k−1 + U(0, φ2)× (gb−Θi)
Step 2. If f(Θ(i)k ) < f(pbi), pbi ← Θ
(i)k If f(pbi) < f(gb), gb← pbi
Step 3. If k > kmax or ||Θ(i)k −Θ
(i)k+1|| < ε(1 + ||Θ(i)
k ||) STOP and report success, else
increment k and go to step 1.
4.1.2 Simulated Annealing
Simulated Annealing (SA) algorithm, proposed by Kirkpatrick et al. (1983), is a probabilistic
local search algorithm that looks for the minimum of an objective function using the neighbor-
hood information of a point in the search space. The name and inspiration of the algorithm
come from annealing in metallurgy, a technique involving heating and controlled cooling of a
material to increase the size of its crystals and reduce their defects. The heat causes the atoms
to become unstuck from their initial positions (a local minimum of the internal energy) and
wander randomly through states of higher energy; the slow cooling gives them more chances
of finding configurations with lower internal energy than the initial one. By analogy with this
physical process, each step of the SA algorithm replaces the current solution by a random
”nearby” solution, chosen with a probability that depends both on the difference between the
corresponding function values and also on a global parameter T (called the temperature), that
is gradually decreased during the process.
Local search algorithms usually start with a random initial solution. A neighbour of this
solution is then generated by some suitable mechanism and the change in cost is calculated.
If a reduction in cost is found, the current solution is replaced by the generated neighbour,
otherwise the current solution is retained. The process is then repeated until no further
14
improvement can be found in the neighbourhood of the current solution and the algorithm
terminates at a local minimum. In SA, sometimes a neighbour that increases the cost is
accepted, to avoid becoming trapped in a local optimum. A nonimproving move is accepted
with a probability that decreases with iterations. Usually, the probability is selected as e−δ/T
where δ is the increase in the objective function value at each iteration and T is a control
parameter.
Simulated Annealing AlgorithmInitialization. Choose initial guess Θ0, fbest = f(Θ0) and T .
Step 1. Set δ = f(Θk)− f(Θk−1).
Step 2. If random() < e−δ/T then Θk = neighbor(Θk)
Step 3. If f(Θk) < fbest then fbest = f(Θk)
Step 4. If k > kmax STOP, else increment k and go to step 1.
4.2 Direct Search Algorithms
Direct search is a method for solving optimization problems that does not require any infor-
mation about the gradient of the objective function. Direct search algorithms search a set of
points around the current point, looking for one where the value of the objective function is
lower than the value at the current point.
4.2.1 Nelder-Mead Method
Nelder-Mead method is a simplex-based direct search method begins with a set of points that
are considered as the vertices of a simplex. At each step, Nelder-Mead generates a new test
position by extrapolating the behavior of the objective function measured at each test point
arranged as a simplex. The algorithm then chooses to replace one of these test points with
the new test point and so the technique progresses.
The Nelder-Mead method (Nelder and Mead, 1965; Lagarias et al., 1965) uses four scalar
parameters: ρ (reflection), χ (expansion), γ (contraction), σ (shrinkage). Let the vertices are
denoted as Θ(1),Θ(2), ...Θ(n+1) where n is the number of parameters to be estimated.
15
Algorithm for Nelder-Mead’s methodInitialization. Choose vertices Θ(1),Θ(2), ...Θ(n+1) and parameters ρ, χ, γ, σ which
satisfy ρ > 0, χ > 1, χ > ρ, 0 < γ < 1, 0 < σ < 1
Step 1. Sort the vertices as f(Θ(1))k < f(Θ(2))k < ... < f(Θ(n+1))k
Step 2. Θrk = Θk + ρΘk − Θ
(n+1)k where Θ =
N∑i=1
Θ(i)
nIf f(Θ
(1)k ) ≤ f(Θr
k) ≤ f(Θ(n)k )
then the reflection point is accepted else go to step 3
Step 3. Θek = Θk + χ(Θr
k −Θ(n+1)k ) If fe < fr the expansion point is accepted else go
to step 4
Step 4. Θck = Θk + γ(Θr
k − Θk). If fc ≤ fr then the contraction point is accepted
else go to step 5
Step 5.{
Θ(1)k ,Θ
(2)k , ...,Θ
(n+1)k
}={
Θ(1)k , v
(2)k , ..., v
(n+1)k
}where v
(i)k = Θ
(1)k + σ(Θ
(i)k −
Θ(1)k )
Step 6. If k > kmax or ||Θ(i)k −Θ
(i)k+1|| < ε(1 + ||Θ(i)
k ||) STOP and report success, else
increment k and go to step 1.
4.2.2 Powell’s Method
Powell’s algorithm (Powell, 1964; Press et al., 1992) performs successive line minimizations
along conjugate directions until it converges to a solution. Specifically, the minimization is
performed in groups of mutually conjugate directions, which are selected without using any
gradient information. The directions are the n standard base vectors where n is the number
of variables. Main advantage is its simplicity and effectiveness in practise.
Algorithm for Powell’s methodInitialization. Choose initial guess Θ0 and stopping parameters δ and ε > 0.
Step 1. For i = 1, ..., n find λi that minimizes f(Θik +λiui) where ui = (0, ..., 1i, ..., 0)
Step 2. uj+1 = uj for j = 1, ..., n− 1 and un = Θn −Θ0
Step 3. Θ(i)k+1 = Θ
(i)k + λiuk for i = 1, ..., n
Step 4. uk+1 = λuk
Step 5. If k > kmax or ||Θ(i)k −Θ
(i)k+1|| < ε(1 + ||Θ(i)
k ||) STOP and report success, else
increment k and go to step 1.
16
4.3 Gradient Based Algorithms
Gradient based optimization algorithms iteratively search for a minimum by computing (or
approximating) the gradient of the NSS function at each iteration since the functional form of
NSS allows it to be differentiable to infinite order. We use two popular and efficient gradient
based optimization algorithms namely Broyden-Fletcher-Goldfarb-Shanno Algorithm (BFGS)
and Generalized Reduced Gradient (GRG). One of the key advantage of these algorithms
compared to Global and Direct serach algorithms is their theoretical capability of using the
geometry of NSS parameters space via its function gradient information. Thus they have
better chance to converge to a minumum and provide a better yield curve fit. However, as
they are forced to explore a small space around the starting point they become quite sensitive
to initial starting point of the NSS parameters. As we discussed in the results section, this
turns out to be one of the key reasons of their poor performance.
4.3.1 Broyden-Fletcher-Goldfarb-Shanno Algorithm
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (Judd, 1998; Press et al., 1992) is a
quasi-Newton algorithm which is an iterative procedure with a local quadratic approximation
of the function. An approximation of the Hessian of the NSS function is used instead of the
funstion itself, which decreases the complexity of the algorithm. As all variants of Newton’s
methods, the idea behind them is to start at a point Θ0 and find the quadratic polynomial
J0(Θ) that approximates J(Θ) to its second degree Taylor expansion at Θ0:
J0(Θ) ≡ J(Θ0) +∇J(Θ0)′ (Θ−Θ0) + 12(Θ−Θ0)′∇2J(Θ0)(Θ−Θ0),
where ∇ is the gradient operator, such that:
∇J(Θ0) =
(∂J
∂β0
(Θ0),∂J
∂β1
(Θ0),∂J
∂β2
(Θ0),∂J
∂β3
(Θ0),∂J
∂τ1
(Θ0),∂J
∂τ2
(Θ0)
)and ∇2J(Θ0) is the Hessian matrix of J(Θ0), that is the symmetric matrix containing the
second-order derivatives of J :
∇2J(Θ0) =
(∂2J
∂Θi∂Θj
(Θ0)
)i,j
, Θi,Θj ∈ (β0, β1, β2, β3, τ1, τ2). (21)
Quasi-Newton methods specify dk (the direction vector) as :
dk = −H−1k ∇J(Θk)
T, (22)
17
The step size αk is obtained by line minimization, where the direction Hk is a positive definite
matrix, Hk may be adjusted from one iteration to the next so that dk tends to approximate
the Newton direction.
Algorithm for BFGS method.Initialization. Choose initial guess Θ(0), initial positive definite Hessian guess H0
(usually the identity matrix), and stopping parameters δ and ε > 0.
Step 1. Solve Hkdk = −∇J(Θk) for the search direction.
Step 2. Solve αk = arg minα J(Θkk + αdk).
Step 3. Θk+1 = Θk + αkdk.
Step 4. Update Hk :
zk+1 = Θk+1 −Θk,
sk+1 = ∇J(Θk+1)−∇J(Θk),
uk = skHkzk,
vk =zkz′ksk
− Hkskuk
,
Hk+1 = Hk −zkz′k
z′ksk− Hksks
′kHk
s′kHksk
Step 5. If ||Θk −Θk+1|| < ε(1 + ||Θk||) go to step 6, else increment k go to step 1.
Step 6. If ||∇J(Θk)|| < δ(1 + |∇J(Θk)|) report success else report convergence to
suboptimal point. STOP.
4.3.2 Generalized Reduced Gradient Algorithm
The Generalized Reduced Gradient (GRG) algorithm (Lasdon et al., 1978) formulates a local
linearization of the nonlinear constraints and performs variable elimination. The optimization
problem is solved by line minimization along a direction obtained by the gradient of the
reduced objective function. GRG algorithms use the following the direction at iteration k
dk = P ∇J(Θk),
where P is the projection matrix defined as
P = QT2Q2 satisfying NTPw = 0,
18
where the columns of the Matrix N are the gradients of constraints, the matrix Q2 consists of
the last n− r rows of the Q factor in the QR factorization of N and w is an arbitrary vector.
In GRG, NT is partitioned as follows
NT = [N1 N2],
where N1 is the transpose of r linearly independent rows of N . Once we have identified N1
we can easily obtain Q2 which is given as
QT2 =
[−N−1
1 N2
I
]
Algorithm for GRG methodInitialization. Choose initial guess Θ(0), and stopping parameters δ and ε > 0.
Step 1. Compute dk = P ∇J(Θk),
Step 2. Solve αk = arg minα J(Θkk + αdk).
Step 3. Θk+1 = Θk + αkdk.
Step 4. If ||Θk −Θk+1|| < ε(1 + ||Θk||) go to step 5, else go to step 1.
Step 5. If ||∇J(Θk)|| < δ(1 + |∇J(Θk)|), report success, else report convergence to
suboptimal point. STOP.
5 Data and Computational Results
In this section, we present the properties of our dataset, performance measures for our algo-
rithms and details of our computational experiments.
5.1 Data
We have used weekly mid-price data for the fixed coupon paying dollar Eurobonds of Brazil,
Mexico and Turkey in the time period from July 2005 to June 2009, retrieved from Bloomberg
(2010). A Eurobond is a bond that is issued in a currency other than the currency of that
country. In our analysis, we focus on these three emerging market countries which have very
liquid Eurobond markets, due to their increasing popularity among investors. Note that,
liquidity of bonds is quite important to have a robust analysis. Characteristics of bonds are
collected from Reuters 3000 Extra (Reuters, 2010). We exclude all the bonds with special
19
characteristics (e.g. callable, puttable, structured, convertible, and Brady Bonds in order to
assure that a homogenous and reliable sample is used in our analysis.
5.2 Performance Measures
We have analyzed the performance measures of algorithms in four dimensions: goodness-of-fit,
computational time, space scanned and stability of parameters. Our first performance measure
is goodness-of-fit, for which we use the Mean Absolute Error (MAE) and the Root Mean Square
Error (RMSE). Clearly, the method that generates smaller errors is prefered. Secondly, we
calculate the CPU time of the algorithms because the algorithm that yields smaller errors
with less computational time would be more preferable. Thirdly, we approximate the amount
of parameter space scanned, because the algorithms that are able to scan more space will have
an obvious advantage to reach to global optimum. Finally, we present graphical illusturations
of NSS parameters evolution over the time, to observe their stability, our fourth measure.
5.2.1 Goodness-of-fit
The performance statistics MAE, and RMSE can be calculated as:
MAE =N∑i=1
|pi − pi|N
,
RMSE =
√√√√ N∑i=1
(pi − pi)2
N.
where N represents the number of bonds. RMSE places a greater weight upon larger errors
and therefore gives a greater indication as to how well the models fit the data at each particu-
lar observation. A low value for the mean is assumed to indicate that the model is flexible and,
on the average, is able to fit the yield curve fairly accurately. MAE is the average distance
between the theoretical bond prices and observed bond prices in absolute value terms. This
measure is not as easily influenced by extreme observations as RMSE. Therefore these two
measures are complementary.
5.2.2 Computation Time and Space Scanned
Computing time could be costly for a long time series bond data, for day-to-day estimation of
NSS parameters. Thus CPU time is computed for each optimization algorithm for the given
20
time period. Average distance to initial parameter set is also an important measure that
approximates the amount of space scanned. Scanning a large space around the initial choice
of parameters shows the quality for the algorithm, but it should be kept in mind that a large
distance result may occur due to some outlier values. For each day, average distance to initial
points is calculated as ||Θmin − Θ0|| where Θmin is the solution obtained in the optimization
process.
5.2.3 Stability
Past researchers have mainly focused on the performance of the yield curve fit and have
paid scant attention to parameter stability of NSS model. Knowing that NSS prameters are
estimated on a daily basis and each of these parameters have a specific financial interpretation,
any robust optimization algorithm is expected to generate smooth parametrs over the time.
Consider for instance, the first parameter of NSS model (β0), which is interpreted as the
long-run level of interest. From one day to the next, jumps of several percentage points for
the estimates of this parameter will be totaly unacceptable, even if the yield curve fit is quite
good. To this end, we depicted the parameters obtained by the NSS optimization procedures
together with the day-to-day values of the parameters, computed according to the study of
Diebold and Li (2006).
5.3 Computational Results
The algorithms are coded in MATLAB according to Press et al. (1992), except for the GRG
algorithm, for which we have used the Solver module that is bundled with Microsoft Excel
(Fylstra, 1998). To obtain the starting parameters, following the study of Diebold and Li
(2006) we have used least squares after linearizing the problem by fixing τ1 and τ2. For Nelder-
Mead, Powell, SA and PSO algorithms the constraints have been embedded in objective
function with a penalty constant C = 1000, whereas they have been explicitly stated for
BFGS. Stopping criterion for the algorithms has been chosen as ε = 10−8. Parameter set for
the Nelder-Mead algorithm is chosen as ρ = 1, χ = 2, γ = 0.5, σ = 0.5. For the SA algorithm,
initial value of T is chosen as 1.
Our experimentation showed that both PSO-W and PSO-G were able to generate high
quality results. However, the performance of PSO-G is not as robust as PSO-W, being too
dependent on the initial locations of the particles. PSO-W/G combines the best features of
these algorithms and returns the best results.
Goodness-of-fit statistics, that are given in Table 1, show that global optimization algo-
rithms clearly outperform gradient based and direct search algorithms. Among global opti-
21
mization algorithms, PSO-W/G algorithm achieves the smallest errors both in terms of RMSE
and MAE when compared to the other algorithms. This might be due the fact that global
optimization algorithms are better equipped to find the global minimum of the NSS function
in the (possible) presence of multiple local minima. As we discuss in section 4.1 the SA and
the PSO algorithms involve downhill moves in order to decrease the objective function, but
also allow random (possibly uphill) moves in order to escape from local minima. Table 1 also
shows that second to global optimization algorithms, direct search algorithms achieve smaller
errors when compared to gradient based ones. Nelder-Mead algorithm performs worse than
the Powell’s method in direct search algorithms. Among the gradient-based algorithms BFGS
algorithm yields larger errors compared to GRG algorithm. In Figure 2, we have also provided
sample yield curve fits for Brazil, which are estimated with the PSO-W/G algorithm. As it
can be observed from the figures, the NSS model is capable of replicating a variety of yield
curve shapes.
Figure 2: Selected fitted yield curves for Brazil.
Computation time results are heterogeneous among the classification of the optimization
algorithms. When all the algorithms are compared, Table 2 clearly shows that on average
direct search performs quite well and Nelder-Mead comes as the fastest algorithm among all.
This might be expected due to nature of the direct search algorithms. As it can be seen from
the steps that we define in section 4.2.1, Nelder-Mead algorithm applies simple operations to
22
the simplex (reflection, contraction, expansion) and also does not evaluate gradient informa-
tion, which makes it quite fast especially when compared to gradient based algorithms. In
terms of computaional speed, the PSO algorithms perform second to Nelder-Mead algorithm.
When Table 1 and Table 2 are considered together it can be observed that, although direct
search methods are quite fast, their goodness-of-fit measures are not as good as PSO algo-
rithms. Also as expected gradient based methods are more time consuming than the other
algorithms, since the gradient of the NSS function must be computed and a line minimization
procedure should be applied at each iteration.
Since the solution quality also depends on the initial starting point, in Table 2 we report
average distance of the solution from the initial point for each algorithm. It can be observed
that the direct search algorithms find a closer solution to the starting point compared to
gradient based and global optimization algorithms. This might be the very reason why direct
search methods produce larger errors and worse fit that we discussed at Table 1. Since
these algorithms exploit smaller space around the starting point, their convergence to true
solution is very sensitive to starting points of the parameters. We can also see why PSO-
W/G algorithm outperforms others in terms of goodness of fit, as it explores more space.
Interestingly, although Table 2 shows that the distance from the initial points for the BFGS
algorithm is much higher than the PSO algorithms, it produces poor goodness-of-fit results
which is due to its failure to converge on certain days. We would like to stress the facet that
our results are robust with respect to different starting points.
Total RMSE Total MAEMethod Brazil Mexico Turkey Brazil Mexico TurkeyGradient Based AlgorithmsBFGS 0.507 0.523 0.491 0.440 0.407 0.810GRG 0.239 0.234 0.248 0.281 0.146 0.187Direct Search AlgorithmsNelder-Mead 0.282 0.241 0.271 0.275 0.198 0.222Powell 0.234 0.198 0.236 0.282 0.232 0.220Global Optimization AlgorithmsPSO-W 0.214 0.179 0.204 0.166 0.145 0.153PSO-G 0.236 0.201 0.238 0.186 0.164 0.185PSO-W/G 0.212 0.178 0.203 0.165 0.143 0.149SA 0.224 0.215 0.230 0.171 0.171 0.178
Table 1: Comparison of optimization algorithms in terms of total RMSE and MAE
Graphical representation of level, slope and curvature factors (β0, β1, β2) of the best four
optimization algorithms for Brazil are depicted in Figures 3, 4, and, 5. Potential problem
of parameter instability for highly parameterized NSS functional form is obvious from the
23
CPU Time (seconds) Average DistanceMethod Brazil Mexico Turkey Brazil Mexico TurkeyGradient Based AlgorithmsBFGS 3156.2 2450.3 2747.9 2.842 7.112 5.210GRG 312.7 320.7 290.6 2.012 2.563 2.466Direct Search AlgorithmsNelder-Mead 6.5 5.6 6.4 1.732 1.950 1.916Powell 207.8 212.5 269.7 1.139 19.505 2.420Global Optimization AlgorithmsPSO-W 117.1 76.9 107.3 2.665 3.122 2.810PSO-G 106.4 68.8 97.5 3.065 3.248 2.588PSO-W/G 164.7 107.2 151.7 2.722 3.217 3.024SA 451.4 398.5 438.5 0.263 0.249 0.238
Table 2: Comparison of optimization algorithms in terms of CPU Time and average distanceto initial points
figures. Deviation from empirical proxies (RMSE’s) for three factors are given in table 3. The
optimization algorithms tested differ greatly in their degree of parsimony.
As it can be observed from these figures and tables, global optimization algorithms are
capable of generating smoother and more realistic level, slope, and curvature factors when
compared to gradient based and direct search algorithms. Among global optimization algo-
rithms, SA produces similar smooth paremeters similar to the PSO-W & PSO-G algorithm
for level and slope factors. However, PSO-W & PSO-G algorithm outperforms SA algorithm
in terms of stability of curvature parameters (β2). This might be due to some of the attractive
characteristics of PSO which has memory, so knowledge of good solutions is retained by all the
particles. As discussed in section 4.1, in our PSO setting, yield curve initially has a population
of random solutions. Each potential solution, called a particle, is given a random velocity and
is flown through the NSS parameter space. Each particle has memory and keeps track of its
previous best position and its corresponding fitness for the yield curve. Although this leads
to a number of local best for the respective particles in the swarm and the one with greatest
fitness to yield curve (the global best of the swarm) is assigned to a NSS parameter. This
constructive cooperation between particles; that is, particles in the swarm share information
among themselves lead to stable parameters for yield curve. This paper demonstrates that
PSO algorithm can differ markedly from other optimization algorithms in terms of smooth-
ness of NSS parameters. Between the gradient based methods, GRG produces a quite smooth
level (β0) factor, however slope and curvature parameters are quite instable. BFGS algo-
rithm produces extremely instable results for all three parameters. Similarly, among direct
24
search algorithms, Powell’s algorithm produces erratic results for all three parameters whereas
Nelder-Mead produces relatively smoother NSS parameters. Finally, we would like to note
that our conclusions hold for Mexico and Turkey Eurobond portfolios as can be seen in table
3.
β0 β1 β2
Method Brazil Mexico Turkey Brazil Mexico Turkey Brazil Mexico TurkeyGradient Based AlgorithmsBFGS 2.596 69.609 11.593 76.137 76.491 46.624 66.093 114.659 73.855GRG 0.283 1.529 0.252 2.309 3.050 3.877 849.774 5.379 220.904Direct Search AlgorithmsNelder-Mead 1.120 1.263 0.818 2.414 2.491 2.629 30.748 34.618 35.094Powell 3.108 6.235 3.559 5.656 7.785 43.478 122.262 11312.76 160.978Global Optimization AlgorithmsPSO-W 1.037 1.117 0.879 2.398 1.795 2.035 4.450 5.178 6.261PSO-G 1.143 1.061 0.863 1.746 1.708 1.628 4.258 4.254 4.290PSO-W & PSO-G 0.949 1.110 0.911 1.676 1.705 1.682 4.232 4.104 4.341SA 0.915 0.455 0.403 1.261 1.565 1.200 6.461 3.883 4.755
Table 3: Deviation of three factors from empirical proxies
Figure 3: Evolution for level (β0) in Brazil
25
Figure 4: Evolution for slope (β1) in Brazil
6 Robustness
In this section, we tested the robustness of the PSO-W & PSO-G algorithm on NSS estimation
along several dimensions. First, we accounted for the perturbation of bond prices by adding a
random term to each bond price as a uniform random value in the interval of bid-ask spread.
Second, we compared in-sample and out-of sample results. Third, we tested the sensitivity
of the optimization algorithms to the initial values, by randomly selecting the initial values.
Finally, we compared the emerging market results with U.S. bond results. The qualitative re-
sults remain unchanged, and for the sake of brevity, we omitted relevant data from the paper.6
Perturbation of Bond Prices: We applied the fitted-price error formula discussed in Bliss
(1997) to perturb the estimates within the bid-ask bound, in which the error is defined as
follows:
ε =
p− pAsk if p > pAsk
pBid − p if p < pBid
0 otherwise
(23)
6The results can be obtained from the authors.
26
Figure 5: Evolution for curvature (β2) in Brazil
The results show that when the bond price data is perturbed randomly in the bid-ask inter-
val, the algorithm shows similar results in terms of goodness-of-fit, parameter space scan, and
stability of parameters.
In-Sample and Out-of-Sample Results: We randomly chose one half of the bonds for each day
in the time period and fit the NSS curve to the data. Afterwards, we applied the algorithm
to the other half of the bond data for each day in three emerging market eurobond data. The
change in performance measures from in-sample to out-of-sample tests reveals the reliability
and robustness of the in-sample performance results. It can be observed from the experiments
that in-sample and out-of-sample fitting results show similar characteristics.
Sensitivity to Initial Values: Another robustness measure is the sensitivity of the optimization
algorithm for the changes in the initial values. We randomly changed initial value of each NSS
parameter in the appropriate intervals. We then applied each optimization to NSS model for
the eurobond data of three emerging markets. We observed from the results that the algo-
rithms other than PSO-W & PSO-G could not find a solution for certain initial parameter
sets, whereas the PSO-W & PSO-G algorithm converges to a solution for each random pa-
rameter set.
27
Developed vs Emerging Markets: We applied the algorithm on the term structure estimation
of treasury bonds of U.S. The data, obtained from the CRSP Government Bond files, consists
of daily quotes for treasury issues from July 2005 to June 2009. The results of the optimization
algorithms are similar to that of emerging markets, in terms of goodness-of-fit, computational
time, space scanned, and stability of estimated parameters.
7 Conclusion
In this study, we successfully applied a variant of the PSO algorithm to solve the optimization
problems associated with the estimation of the parameters of the NSS yield curve model. We
compared the PSO method with some well known non-linear optimization algorithms from the
literature. We performed a computational comparison among these algorithms and estimated
the day-to-day parameters of NSS model on the liquid bond portfolios of Turkey, Brazil, and
Mexico for a five year period of time. We have applied four perfomance measures, goodness-
of-fit, computational time, space scanned, and stability of estimated parameters to evaluate
the algorithms. The PSO-W & PSO-G algorithm is observed to improve goodness of fit and
the computational time as well as to generate a stable, economically interpretable paramater
set. Our computational experiments also show that the PSO-W & PSO-G algorithm is quite
robust with respect to the choice of initial starting values, as it explores a larger search space.
Overall, we would like to point out that the PSO-W & PSO-G algorithm can provide a robust
framework to solve the numerical problems reported in the estimation of NSS parameters.
28
References
Abd-El-Wahed, W., Mousa, A., and El-Shorbagy, M. (2011). Integrating particle swarm
optimization with genetic algorithms for solving nonlinear optimization problems. Journal
of Computational and Applied Mathematics, 235(5):1446 – 1453.
Anderson, N. and Murphy, G. (1996). Estimating and Interpreting the Yield Curve. John
Wiley and Sons, New york.
Bliss, R. (1997). Testing term structure estimation methods. Advances in Futures and Options
Research, 9:97–231.
Bliss, R. and Ritchken, P. (1996). Empirical tests of two state-variable heath-jarrow-morton
models. Journal of Money, Credit and Banking, 28(3):452–476.
Bloomberg (2010). Bloomberg. http://www.bloomberg.com.
Bolder, D. and Streliski, D. (1999). Yield curve modelling at the bank of canada. Technical
Report 84, Bank of Canada.
Clerc, M. and Kennedy, J. (2002). The particle swarm - explosion, stability, and convergence
in a multidimensional complex space. IEEE Transactions on Evolutionary Computation,
pages 58–73.
Cox, J., Ingerssol, J., and Ross, S. (1985). A theory of term structure of interest rates.
Econometrica, 53(2):385–407.
Csajbok, A. (1998). Zero-coupon yield curve estimation from a central bank perspective. MNB
Working Papers.
De Pooter, M. (2007). Examining the nelson-siegel class of term structure models: In-sample
fit versus out-of-sample forecasting performance. Federal Reserve Board.
Diebold, F. and Li, C. (2006). Forecasting the term structure of government bond yields.
Journal of Econometrics, 130:337–364.
Fylstra, I. (1998). Design and use of the microsoft excel solver. Interfaces, pages 29–55.
Gurkaynak, R., Sack, B., and Wright, J. (2007). The u.s. treasury yield curve: 1961 to the
present. Journal of Monetary Economics, pages 2291–2304.
Heath, D., Jarrow, R., and A., M. (1992). Bond pricing and the term structure of interest
rates: A new methodology for contingent claims. Econometrica, 60(1):77–105.
29
Ho, T. and Lee, S. (1986). Term structure movements and pricing interest rate contingent
claims. Journal of Finance, 41(5):1011–1029.
Ioannides, M. (2003). A comparison of yield curve estimation techniques using uk data.
Journal of Banking and Finance, 27:1–26.
Judd, K. (1998). Numerical Methods in Economics. MIT Press, Cambridge, MA.
Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983). Optimization by simulated annealing.
Science, 220:671–680.
Lagarias, J., Reeds, J., Wright, M., and Wright, P. (1965). Convergence properties of the
nelder-mead simplex method in low dimensions. SIAM Journal on Optimization, 9:112–
147.
Lasdon, L., Waren, A., Jain, A., and Ratner, M. (1978). Design and testing of a generalized
reduced gradient code for nonlinear programming. ACM Transactions on Mathematical
Software, 4:34–50.
Manousopoulos, P. and Michalopoulosa, M. (2007). Comparison of non-linear optimization
algorithms for yield curve estimation. European Journal of Operational Research, 192:594–
602.
Martellini, L., Priaulet, P., and S., P. (2003). Fixed Income Securities : Valuation, Risk
Management and Portfolio Strategies. John Wiley and Sons, Newyork.
McCulloch, J. H. (1971). Measuring the term structure of interest rates. The Journal of
Business, 44:19–31.
Nelder, J. and Mead, R. (1965). A simplex method for function minimization. Computer
Journal, 7:308–313.
Nelson, C. and Siegel, A. (1987). Parsimonious modelling of yield curves. The Journal of
Business, 60:473–489.
Pedersen, M. and A., C. (2010). Simplifying particle swarm optimization. The Journal of
Applied Soft Computing, pages 618–628.
Powell, M. (1964). An efficient method for finding the minimum of a function of several
variables without calculating derivatives. Computer Journal, 7:155–162.
Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. (1992). Numerical recipes in C.
The Art of Scientific Computing. Cambridge University Press.
30
Reuters (2010). Reuters. http://www.reuters.com.
Seppala, J. and Viertio, P. (1996). The term structure of interest rates: estimation and
interpretation. Technical Report 19, Bank of Finland.
Shea, G. S. (1985). Interest rate term structure estimation with exponential splines: A note.
Journal of Finance, 40:319–325.
Shi, Y. and Eberhart, R. (1998). A modified particle swarm optimizer. Evolutionary Compu-
tation Proceedings, IEEE World Congress on Computational Intelligence, pages 69–73.
Svensson, L. (1994). Estimating and interpreting forward interest rates: Sweden 1992–1994.
National Bureau of Economic Research, (4871).
Trelea, I. (2003). The particle swarm optimization algorithm: convergence analysis and pa-
rameter selection. Information Processing Letters 85, pages 317–325.
Vasicek, O. (1977). An equilibrium characterization of the term structure. Journal of Financial
Economics, 5(2):177–188.
Yang, X., Yuan, J., Yuan, J., and Mao, H. (2007). A modified particle swarm optimizer with
dynamic adaptation. Applied Mathematics and Computation, 189(2):1205 – 1213.
31