a small package of matlab routines for the estimation of some … · 2018. 8. 21. · a small...

A small package of matlab routines for the estimation ofsome term structure models

Anh Le and Ken Singleton

August 19, 2018

1

Contents

1 GetData TreasuryYields.m: extract Treasury zero yields 51.1 Data Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 GetData AngPiazzesiJME2003.m: extract macro variables used in Angand Piazzesi (2003) 132.1 Data Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 LS opt.m: manage numerical optimizations 173.1 What does this function do? . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Example 1 - running an OLS . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Example 2 - running an OLS with constraints . . . . . . . . . . . . . . . . . 223.4 Example 3 - running an OLS with analytically concentrated parameters . . . 253.5 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 A0N computeBnAn.m: compute yield loadings for a canonical A0(N)model 314.1 What does this function do? . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 QG computeCnBnAn.m: compute yield loadings for a canonical quadraticgaussian term structure model 355.1 A general setup for a ZLB-consistent quadratic gaussian term structure model 355.2 Bond Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4 Bond Pricing - Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.6 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 FMN Rotate.m: rotate an affine/quadratic-gaussian term structuremodel 416.1 Rotation mechanics – state dynamics . . . . . . . . . . . . . . . . . . . . . . 416.2 Rotation mechanics – bond pricing . . . . . . . . . . . . . . . . . . . . . . . 416.3 Example - affine rotation under Q . . . . . . . . . . . . . . . . . . . . . . . . 426.4 Example - affine rotation under both P and Q . . . . . . . . . . . . . . . . . 446.5 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2

7 Reg K1Q.m: estimate the KQ1 matrix through linear regressions 46

7.1 A regression-based algorithm to estimate KQ1 . . . . . . . . . . . . . . . . . . 46

7.2 Steps to implementing the regression-based algorithm . . . . . . . . . . . . . 467.3 Example 1 - Obtain an estimate of K1Q using yields-only factors . . . . . . . 477.4 Example 2 - Obtain an estimate of K1Q using yields-only and macro factors 477.5 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8 Reg OLS.m: run an OLS regression 508.1 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9 Reg OLSconstrained.m: run an OLS regression with equality constraintsimposed on the coefficient matrix 519.1 Objective function and analytical estimate . . . . . . . . . . . . . . . . . . . 519.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.3 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

10 Reg RRR.m: run an OLS regression with rank constraint imposed onthe coefficient matrix 5410.1 Objective function and analytical estimate . . . . . . . . . . . . . . . . . . . 5410.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5610.3 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

11 Reg GLS.m: run a GLS regression 5811.1 Analytical estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5811.2 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

12 Gaussian.m: computes the Gaussian density 6012.1 Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6012.2 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

13 Kalman linear.m: computes the density for a linear Kalman filteringproblem 6113.1 The setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6113.2 Likelihood computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6113.3 Conditional moments obtained through linear regressions . . . . . . . . . . . 6413.4 Speed issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6513.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6613.6 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

14 mult prod.m: efficiently multiplies arrays of matrices 6814.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6814.2 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3

15 mult inv.m: efficiently inverts arrays of matrices 7115.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7115.2 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

16 kron vec.m: efficiently computes Kronecker tensor product of arrays ofvectors 7416.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7416.2 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

17 kron sum.m: efficiently computes Kronecker tensor product of arraysof matrices and then take sum 7617.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7617.2 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

18 df dx.m: efficiently computes the (numerical) first order derivative ofa given function 7818.1 Rough idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7818.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7918.3 Matlab header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4

1 GetData TreasuryYields.m: extract Treasury zero

yields

%% [R, AllR] = GetData__TreasuryYields(sample, mat, var)

% extracts Treasury zero yields over a sample period.

1.1 Data Compilation

We have five different yields datasets summarized in Table 1

Dataset Smoothing Algorithm Source file Sample Period

GSW Nelson-Siegel Type gsw.mat daily Jun/61:Aug/18SFB Smoothed Fama Bliss sfb jpm.mat (end-of) monthly Jan/70:Sep/04

(original dataset from Bliss)UFB Unsmoothed Fama Bliss fbtitted.mat (end-of) monthly Jan/70:Dec/03ALE Smoothed Fama Bliss yc.mat daily Jun/61:Dec/17

(reconstructed and extended by Anh Le)CRSP Unsmoothed Fama Bliss

maturities 1,2,3,4,5 years CRSP.mat (end-of) monthly Jun/52:Dec/17maturity 3 months CRSP3m.mat (end-of) monthly Dec/25:Dec/17

Table 1: Treasury Yields Data

With the exception of the CRSP dataset, each of these four datasets come in 40 ma-turities ranging from 3m, 6m, 9m, ..., 120m. Since we mostly use end-of-month series,we extract end-of-month data from these datasets and compile them into one mat fileTreasuryYields endofmonth.mat:

TreasuryYields_endofmonth =

SFB: [417x42 double]

GSW: [687x42 double]

UFB: [408x42 double]

ALE: [679x42 double]

CRSP: [787x42 double]

For each matrix, the first two columns give yyyy and mm of the observations. The nextforty columns give zero yields for the 40 maturities. For the CRSP dataset, the last twentycolumns receive nan values since the maturities greater than five years are not covered. Allthe compilation steps can be found in this file: TreasuryYields endofmonth compiled.m. Thisfile can be used when the datasets are extended to more recent periods.

In addition, we also compile the daily yields into TreasuryYields daily.mat:

TreasuryYields_daily =

5

GSW: [14251x43 double]

ALE: [14104x43 double]

For each matrix, the first three columns give yyyy, mm, and dd of the observations. Thenext forty columns give zero yields for the 40 maturities.

6

1.2 Examples

Example 1

To obtain end-of-month zero yields from the ALE dataset with maturities from 1-year, 2-year,... to 10-year for the sample period Jan/84:Dec/12, we use:

[R, AllR] = GetData__TreasuryYields([198401, 201012], (1:10), ’ALE’);

size(R)

ans =

10 324

size(AllR)

ans =

40 324

The first output argument gives yields of desired maturities. The second argument givesyields of all maturities (40 in total: 3m, 6m, 9m, ..., 120m) for the same sample period. Notethat the sample period in the above example (Jan/84:Dec/10) exceeds the sample periodcovered by the ALE dataset. Yields beyond the covered period receive a nan value.

Example 2

Often time, we want to compute excess returns and/or forward rates associated with a yieldsdataset. To compute 12-month excess returns using the GSW dataset, we simply append thedataset name with a ’:’ and the keyword ’xr12’:

[xr12m, AllR] = GetData__TreasuryYields([198401, 201012], (1:10), ’GSW:xr12’);

size(xr12m)

ans =

9 324

size(AllR)

ans =

40 324

7

Note that the excess returns matrix xr12m has the same number of data points as the yieldsmatrix R above but has one less row. The reason for the smaller number of rows is because12-month excess return is not well defined for one of the given maturities: one-year. As aresult, the (1,1) entry of the excess returns matrix xr12m corresponds to the excess return byholding a two-year zero bond from the end of Jan/84 to the end of Jan/85. In addition, theannual excess returns for the last 12 data points require yields beyond the sample period,therefore these returns will receive a nan value. If we are interested in excess returns of adifferent horizon, say 3-month, we simply replace the keyword ’xr12’ by ’xr3’. If we want6-month returns, replace ’xr12’ by ’xr6’. In general, it can be any ’xrn’ where n is somemultiple of 3. ’xr7’, for example, will not work.

The AllR matrix is exactly the same as in the previous example, containing yields of allmaturities for the same sample period.

Example 3

To obtain forward rates using the SFB dataset, we replace the keyword ’xr12m’ by ’fw’:

fw = GetData__TreasuryYields([198401, 201012], (1:10), ’SFB:fw’);

size(fw)

ans =

10 324

As can be seen, the forward matrix fw is of the same size as the yields matrix R. The firstrow of fw is identical to the first row of R – it simply gives the one-year zero yields. However,the second row of fw is different from the second row of R. This row corresponds to the1-year-to-2-year forward rates. The third row corresponds to the 2-year-to-3-year forwardrates and so on.

Example 4

To compute the realized variances using the GSW dataset, we use the keyword ’RV’.

RV = GetData__TreasuryYields([198401, 201012], (1:10), ’GSW:RV3’);

size(RV)

ans =

10 324

The (10,324) entry of the output matrix is simply the realized variance of the 10-year yieldcomputed using the daily differences of the 10-year yields in the months of October, November,and December of 2010:

22× 1

T

∑t

(y10,t+1 − y10,t)2.

8

T is the number of trading days in the three months: October, November, and December of2010. The multiplier 22 is to normalize all calendar month to have 22 trading days.

Note that the first two columns of RV are filled with nan values because each variancerequires three months of daily data.

Example 5

To compute the realized variances of yields PC using the ALE dataset, we use the keyword’PCRV’.

RV = GetData__TreasuryYields([198401, 201012], [1,3,5,7,10], ’ALE:PCRV6’);

size(RV)

ans =

5 324

The (2,324) entry of the output matrix is simply the realized variance of the 2nd PCof daily yields (with maturitie:s 1, 3, 5, 7, and 10 years) in the months of July, August,September, October, November, and December of 2010:

22× 1

T

∑t

(PC2,t+1 − PC2,t)2.

T is the number of trading days in the six months: July, August, September, October,November, and December of 2010. The multiplier 22 is to normalize all calendar month tohave 22 trading days.

Note that the first five columns of RV are filled with nan values because each variancerequires six months of daily data.

Example 6

Suppose now that for some reason, we’d like to load monthly data from the GSW datasetwhere the monthly series is defined as the last day before a specified date, say the 23rd of themonth. Then we can do:

[R, AllR] = GetData__TreasuryYields([198401, 201012], (1:10), ’GSW23’);

size(R)

ans =

10 324

size(AllR)

9

ans =

40 324

To obtain the forward rates or excess returns with this end-of-month conventions, we usesimilar syntax as before:

fw = GetData__TreasuryYields([198401, 201012], (1:10), ’GSW23:fw’);

xr3m = GetData__TreasuryYields([198401, 201012], (1:10), ’GSW23:xr3’);

All of these commands can be applied to the ALE dataset as well.

10

1.3 Matlab header

%% [R, AllR] = GetData__TreasuryYields(sample, mat, var)

% extracts Treasury zero yields over a sample period.

%

%% INPUTS:

% sample: vector sample period with format [yyyymm, yyyymm]

% e.g. [198401, 200712].

% The data will be extracted end of

% month. So in this example, the first

% observation will be as of end of Jan/84.

% mat: J-vector maturities in years. e.g. [0.5, 2] means the

% 6-m and 2-year zero yields will be extracted

% var: text specifies the source of data:

% ’UFB’: Unsmoothed Fama Bliss

% ’SFB’: Smoothed Fama Bliss

% ’GSW’: GSW

% ’ALE’: new UFB data constructed by Anh Le

% ’CRSP’: Fama zero discount yields

% downloaded from CRSP

%

% Additionally, this code can be followed by a

% colon (:) and:

% ’xr6’: to compute 6-month excess

% returns of the maturities

% included, as long as they are

% longer than 6 months in

% maturity. Similary, ’xr12’ will

% compute 12-month excess

% returns.

% ’fw’: to compute forward rates

%

% ’RV6’: to compute 6-month realized variances

% of the maturities included using daily

% data. Therefore this only works

% with the GSW and ALE datasets.

% Unlike the ’xr6’, this measure

% is backward looking. That is,

% the RV at time t is calculated

% using daily data from the 6 months

% from t-5 to t. Similarly, we can specify

% ’RV3’, ’RV12’, ... to compute

% 3-month, 12-month realized

11

% variances.

%

% ’PCRV6’: to compute 6-month realized variances

% of the PCs constructed from the maturities

% included using daily

% data. Therefore this only works

% with the GSW and ALE datasets.

% Unlike the ’xr6’, this measure

% is backward looking. That is,

% the RV at time t is calculated

% using daily data from the 6 months

% from t-5 to t. Similarly, we can specify

% ’RV3’, ’RV12’, ... to compute

% 3-month, 12-month realized

% variances.

%

%

%% OUTPUTS:

% R: [J x T] matrix of data. J is the number of maturities desired.

% Missing data receive a nan value.

% AllR: [40 x T] yields data over the same sample period for all

% available maturities with 3-monthly span: 3m,

% 6m, 9m, ... 120m.

12

2 GetData AngPiazzesiJME2003.m: extract macro

variables used in Ang and Piazzesi (2003)

%% y = GetData__AngPiazzesiJME2003(sample, var)

% extracts the variables used in Ang and Piazzesi (2003) for a given sample period

2.1 Data Compilation

This dataset contains variables used in Ang and Piazzesi (JME 2003):

AP_Data2 =

dates: [667x1 double]

HELP: [667x1 double]

UE: [667x1 double]

EMPLOY: [667x1 double]

IP: [667x1 double]

CPI: [667x1 double]

PPI: [667x1 double]

PCOM: [667x1 double]

The data are monthly from Jan:55 until Jul:10:

datestr(AP_Data2.dates(1))

31-Jan-1955

datestr(AP_Data2.dates(end))

31-Jul-2010

The variables are listed in Table 2.

13

Variable ContentHELP Index of help wanted from advertisements in newspapers.

This series has now been cancelled and replaced by an onlinehelp index.

UE unemployment rateEMPLOY employment indexIP real IP growthCPI CPI inflationPPI producer price index inflationPCOM commodity futures price index

Table 2: Variables in Ang and Piazzesi (2003)

2.2 Examples

Example 1

To obtain the standardized time series of unemployment rate from 1960-Jan through 2009-Dec,we do

UE = GetData__AngPiazzesiJME2003([196001, 200912], ’UE’);

>> mean(UE)

ans =

0.0272

>> std(UE)

ans =

0.9788

Note that the mean and standard deviation of the series are not exactly zero and onebecause the series is normalized over the entire sample of available data (from 1955-Janthrough 2010-Jul).

Example 2

To obtain the raw help-wanted index, without standardizing, we need to add a “:raw” afterthe variable name (HELP):

help_raw = GetData__AngPiazzesiJME2003([196001, 200912], ’HELP:raw’);

14

Example 3

To obtain the first pc of the (standardized) inflation variables and of the (standardized)growth variables, we do:

INFPC = GetData__AngPiazzesiJME2003([196001, 200912], ’INFPC’);

REALPC = GetData__AngPiazzesiJME2003([196001, 200912], ’REALPC’);

>> size(INFPC)

ans =

1 600

>> size(REALPC)

ans =

1 600

15

2.3 Matlab header

%% y = GetData__AngPiazzesiJME2003(sample, var)

% extracts the variables used in AP (2003) for a given sample period.

%

% All variables are standardized (demean and standard-deviation normalized)

% To extract raw variables instead, add ’:raw’ after the requested variable

% name.

%

%% INPUTS

% sample: vector sample period with format [yyyymm, yyyymm]

% e.g. [198401, 200712].

% The data will be extracted end of

% month. So in this example, the first

% observation will be as of end of Jan/84.

% var: text requested variable name

% + HELP: Index of help wanted from advertisements in newspapers.

% + UE: unemployment rate

% + EMPLOY: employment index

% + IP: real IP growth

% + CPI: inflation

% + PPI: producer price index inflation

% + PCOM: commodity futures price index

%

% + INFPC: first pc of inflation variables (CPI,

% PPI, PCOM)

% + REALPC: first pc of growth variables (HELP,

% UE, EMPLOY, IP)

%

% + HELP:raw, UE:raw,...,INFPC:raw, REALPC:raw

% will yield raw, non-standardized, variables

%

%% OUTPUTS

% y: [1 x T] time series of the variable requested

16

3 LS opt.m: manage numerical optimizations

%% [out, x] = LS__opt(f, tol, varargin)

% manages the minimization of mean(f)

3.1 What does this function do?

This function supervises the minimization of a given function f, using both fminunc.m andfminsearch.m, repeatedly, until convergence, based on a specified tolerance level, is achieved.Because the numerical algorithms used by fminunc.m (derivatives based) and fminsearch.m(no derivatives are used) are very different, using both often results in a much more robustestimation outcome.

Additionally, the objective function f can take multiple matrix arguments and certain con-straints can be imposed on the the input arguments. Analytical concentration of parameterscan also be allowed.

17

3.2 Example 1 - running an OLS

Consider the following OLS regression:

Y = BX + e where e ∼ N(0, SS)

where we want to obtain an estimate for both B and SS.

Generate/specify data

Let’s generate some random data series for X and Y by:

N = 2; T = 500;

Y = randn(N,T);

X = randn(N,T);

Define objective function

Given the randomly generated data series above, we now specify the objective functionunderlying our OLS regression:

f = @(B, SS) (-Gaussian(Y - B*X, SS));

Here we use the routine Gaussian.m (also provided in the same library) to compute theGaussian density of the errors, Y −BX, given the covariance matrix SS.

Note that the objective function takes two inputs: B and SS, each of them of size 2× 2.Also, SS must be a psd matrix.

Specify starting values

We need to prime the optimization process with some starting values:

x0.B = randn(N);

x0.SS = eye(N);

Running the optimization using LS opt.m

tic;

[out, x1] = LS__opt(f, 1e-4, ...

x0.B, ’B: bounded’, [], [], ...

x0.SS, ’SS: psd’, [], [], ...

’iter off’);

toc

2.7764

18

2.7764

Elapsed time is 0.303089 seconds.

Several remarks:

1. The first two inputs, f and 1e-4, to the LS opt.m routine are the objective functionand the tolerance level.

2. Next, for each of the two input parameters to the objective function, a set of four inputsare needed. For example, for the first parameter, B, these four inputs are:

x0.B, ’B: bounded’, [], [], ...

The first input, “x0.B,” provides the starting value for B. The second input, “B:bounded,” lets the LS opt.m routine know the name of the variable (B) and its type is“bounded.” The next two inputs give the lower bound and the upper bound. Since theyare empty, effectively B is unbounded in this case.

3. For the second parameter, SS, the four inputs are:

x0.SS, ’SS: psd’, [], [], ...

Again, the first input, “x0.SS,” provides the starting value for SS. The second input,“SS: psd,” lets the LS opt.m routine know the name of the variable (SS) and its type is“psd.”, or positive semi definite matrix. The next two inputs give the lower bound andthe upper bound. Again, since they are empty, we effective impose no bounds on SS.

4. The last input, “iter off”, suppresses any printout of matlab optimization routines. Theonly thing that got printed out are the two instances of:

2.7764

These are the objective values obtained after one iteration of fminunc.m and oneiteration of fminsearch.m. Since there is no improvement in the objective value aftertwo iterations, convergence is achieved.

Outputs of the LS opt.m routine

The first output of the LS opt.m routine gives us any output the objective function mightproduce at the optimal parameter estimates. This output must be specified as a secondoutput of the objective function f. Since we do not define any output with our (in-line)definition of our objective function. The first output is an empty matrix in this case.

19

>> out

out =

[]

The second output of the LS opt.m routine gives us the final parameter estimates:

x1 =

struct with fields:

vec_: [7x1 double]

B: [2x2 double]

SS: [2x2 double]

The first field, “vec ”, gives the values of some auxiliary parameters (used to guaranteeconstraints, such as the psd constraint, are respected). Most often, we dont have to worryabout this field.

Comparing our estimates to analytical estimates

Comparing our numerical estimates to the true analytical estimates:

xtrue.B = Y/X;

e = Y - (Y/X)*X;

xtrue.SS = (e*e’/T);

%% comparing estimates of B:

[x1.B-xtrue.B]

%% comparing estimates of SS:

[x1.SS-xtrue.SS]

ans =

1.0e-07 *

0.0702 -0.2795

0.3792 0.0592

ans =

20

1.0e-07 *

0.0013 0.0074

0.0074 0.3028

21

3.3 Example 2 - running an OLS with constraints

Consider the OLS regression:

Yt+1 = B Yt + C Xt + e where e N(0, SS)

where we want to impose that:

• B: be diagonal

• C: all elements of C be non-negative



N = 2; M=3; T = 500;

Y = randn(N,T);

X = randn(M,T);

Specify the objective function

Given the randomly generate data, we specify the objective function by:

t = (1:T-1);

f = @(B, C, SS) (-Gaussian(Y(:,t+1) - B*Y(:,t) - C*X(:,t), SS));

Note that the objective function takes three inputs: B, C, and SS. B must be diagonal andSS must be a psd matrix. Also, each element of C must be non-negative.



x0.B = diag(randn(N,1));

x0.C = rand(N, M);

x0.SS = eye(N);

Running the optimization using LS opt.m

tic;

[out, x1] = LS__opt(f, 1e-4, ...

x0.B, ’B: diag’, [], [], ...

x0.C, ’C: bounded’, 0, [], ...

x0.SS, ’SS: psd’, [], []);

toc

22

Several remarks:

1. The first two inputs, f and 1e-4, to the LS opt.m routine are the objective functionand the tolerance level.

2. Next, for each of the two input parameters to the objective function, a set of four inputsare needed. For example, for the first parameter, B, these four inputs are:

x0.B, ’B: diag’, [], [], ...

The first input, “x0.B,” provides the starting value for B. The second input, “B: diag,”lets the LS opt.m routine know the name of the variable (B) and its type is “diag” –a diagonal matrix. The next two inputs give the lower bound and the upper bound.Since they are empty, effectively B is unbounded in this case.

3. For the third parameter, SS, the four inputs are:

x0.C, ’C: bounded’, 0, [], ...

The first input, “x0.C,” provides the starting value for C. The second input, “C:bounded,” lets the LS opt.m routine know the name of the variable (C) and its type is“bounded.” The next two inputs give the lower bound and the upper bound. A lowerbound of 0 guarantees that each element of C is non-negative. Since the upper boundis empty, C is not bounded from above.

4. For the third parameter, SS, the four inputs are:

x0.SS, ’SS: psd’, [], [], ...

Again, the first input, “x0.SS,” provides the starting value for SS. The second input,“SS: psd,” lets the LS opt.m routine know the name of the variable (SS) and its type is“psd.”, or positive semi definite matrix. The next two inputs give the lower bound andthe upper bound. Again, since they are empty, we effective impose no bounds on SS.

5. Notice that we no longer include the keyword “iter off” at the end, so the LS opt.mcode will print out outputs of all fminunc.m and fminsearch.m routines.

Outputs of the LS opt.m routine

>> out

out =

[]

23

>> x1

x1 =

struct with fields:

vec_: [111 double]

B: [22 double]

C: [23 double]

SS: [22 double]

%% print estimate of B:

x1.B

ans =

-0.0431 0

0 0.0225

%% print estimate of C:

x1.C

ans =

0.0000 0.0584 0

0.0000 0.0000 0

%% print estimate of SS:

x1.SS

ans =

1.1120 0.0455

0.0455 1.0446

24

3.4 Example 3 - running an OLS with analytically concentratedparameters

Consider again the OLS regression:

Yt+1 = B Yt + C Xt + e where e N(0, SS)

where we want to impose that:

• B: be diagonal

• C: all elements of C be non-negative

This time, we want to build in the fact that for each value of B and C, the optimalestimate of SS can be obtained analytically as:

S = ee′/(T − 1)

where et = Yt+1 − (B Yt + C Xt).



N = 2; M=3; T = 500;

Y = randn(N,T);

X = randn(M,T);

Specify the objective function

Given the randomly generate data, we specify the objective function by function LS opt Example3 f.m:

f = @(B,C,SS) LS__opt__Example3_f(B,C,SS,Y,X);

Function LS opt Example3 f.m: is defined as follows:

function [y, out] = LS__opt__Example3_f(B,C,SS,Y,X)

%% [y, out] = LS__opt__Example3_f(B,C,SS)

% Compute ML density for the following OLS regression:

% Y_t+1 = B*Y_t + C*X_t + e where e ~ N(0, SS)

%

%% INPUTS

% B: [N x N]

% C: [N x M]

% SS: [N x N]

%% OUTPUTs

% y: [1 x T]

25

% out: struct

T = size(Y,2);

t = (1:T-1);

e = Y(:,t+1) - B*Y(:,t) - C*X(:,t);

if isempty(SS) % if SS is empty -> work out the optimal estimate of SS analytically

SS = e*e’/(T-1);

end

y = -Gaussian(e, SS);

if nargout>1

out.input = struct(’Y’, Y, ’X’, X);

out.ests = struct(’B’, B, ’C’, C, ’SS’, SS);

end

Two remarks:

• the analytical concentration of SS is triggered whenever the input parameter SS isempty. Since this involves a conditional statement if, checking for the emptiness of SS,we cannot use an inline function to define the objective.

• In addition to the objective value, the objective function also spits out a second output,out. This output stores the values of the data (X and Y) used in the estimation. Addi-tionally, the output also stores estimates for all parameters, including the concentratedout parameter SS.



x0.B = diag(randn(N,1));

x0.C = rand(N, M);

x0.SS = eye(N);

Running the optimization using LS opt.m, concentrating out SS

ttic;

[out, x1a] = LS__opt(f, 1e-4, ...

x0.B, ’B: diag’, [], [], ...

x0.C, ’C: bounded’, 0, [], ...

[], ’@SS: psd’, [], [], ...

’iter off’)

toc

26

2.8099

2.8099

out =

struct with fields:

input: [11 struct]

ests: [11 struct]

x1a =

struct with fields:

vec_: [111 double]

B: [22 double]

C: [23 double]

SS: [22 double]


Several remarks:

• Note that the analytical concentration is triggered by:

– the “@” character put in front of the “SS” label

– an empty starting value for SS

• The first output of LS opt.m is no longer empty because we have specified somesecondary output in our specification of the objective function.

Running the optimization using LS opt.m, without concentrating out SS

tic;

[out, x1] = LS__opt(f, 1e-4, ...

x0.B, ’B: diag’, [], [], ...

x0.C, ’C: bounded’, 0, [], ...

x0.SS, ’SS: psd’, [], [], ...

’iter off’)

toc

2.8099

27

2.8099

out =

struct with fields:

input: [11 struct]

ests: [11 struct]

x1 =

struct with fields:

vec_: [111 double]

B: [22 double]

C: [23 double]

SS: [22 double]


Several remarks:

• Since we no longer concentrate out SS, the search space is larger. Hence it takes a bitlonger to reach convergence (2.77 as opposed to 1.90 seconds).

• The objective value is still the same (at 2.8099) in both cases.

28

3.5 Matlab header

%% [out, x] = LS__opt(f, tol, varargin)

% manages the minimization of mean(f)

%

%% INPUTS:

% f: function vector-valued objective function

% + the mode minimizes mean(f)

% + f can take multiple matrix-valued

% input arguments

% tol: scalar convergence tolerance

% + for ML estimation, a reasonable value = 1e-4

% varargin: array + starting values and constraints:

% for each input argument K (of f), we

% need four inputs that look like:

% K0, ’K0: bounded’, lb, ub

%

% 1) a starting value: K0

% 2) a variable label (’K0’) followed by a ’:’

% followed by a type of constraint.

% the constraint can be:

% + ’bounded’: bounded matrix

% + ’diag’: diagonal matrix

% + ’Jordan’: a matrix of Jordan type

% + ’psd’: psd matrix

% 3) a lower bound lb (lb=[] -> no lower

% bound)

% 4) an upper bound ub (ub=[] -> no upper

% bound)

%

% + If a variable name starts with a

% ’@’, it means that that parameter will be

% analytically concentrated out in

% the specification of f. In this

% case, no starting value is needed

% for this particular parameter. An

% empty matrix can be provided as

% a starting value.

%

% + After all starting values/constraints are

% specified for all

% parameters, the last item of varargin

% specifies how the optimization is run

29

% + ’iter off’: suppressess all the

% printouts of the numerical

% optimization routines used

% + ’fminunc only’: only uses

% fminunc.m

% + ’fminsearch only’: only uses

% fminsearch.m

%

% See examples below for more details on how to

% specify varargin.

%% OUTPUTS:

% out: struct second output produced by f

% (the first output of f must be the objective

% value to be minimized)

% x: struct contains parameter estimates (x.K etc)

30

4 A0N computeBnAn.m: compute yield loadings

for a canonical A0(N) model

[BnX, AnX, betan] = A0N__computeBnAn(mat, K1XQ, dX, r0, SSX)

% computes loadings for a canonical A_0(N) model.

4.1 What does this function do?

computes bond loadings implied by the following risk-neutral dynamics:

Xt+1 = KQ1XXt + εt+1

where εt+1 ∼ N(0,ΣX) andrt = r0 + δX Xt.

Note that rt is per unit of time interval and it is not per annum. In other words, e−rt givesthe price of a one-period zero coupon bond with a face value of $1. Now, let An and Bn besuch that the price of the n-period zero coupon bond can be written as:

e−An−BnXt

No-arb pricing requires:

e−An−BnXt = e−r0−δX XtEQt [e−An−1−Bn−1Xt+1.]

which leads to the standard recursions:

Bn = δX +Bn−1KQ1X

An = r0 + An−1 −1

2Bn−1ΣXB

′n−1.

Importantly, note that Bn’s, for every n, only depend on KQ1X and δX , not r0 or ΣX . To

see the nature of the dependence of An on the parameters (KQ1X , δX , r0, and ΣX), we can

write:

An = nr0 −n−1∑i=1

BiΣXB′i

= nr0 − trace

(ΣX

(n−1∑i=1

B′iBi

)).

Therefore

An/n = r0−trace

(ΣX

(n−1∑i=1

B′iBi

))/n︸︷︷︸

βn

.

The inputs of this routine: K1XQ, dX, r0, SSX correspond to KQ1X , δX , r0, and ΣX . The

outputs of the routines: BnX, AnX, and betan correspond to Bn/n, An/n, and βn. Thevector mat determines which maturity– which n– will be outputted.

31

4.2 Examples

Example 1

Suppose that we work with a term structure model with monthly intervals (∆t = 1/12), andthe risk neutral feedback matrix is already given by the three-dimensional matrix K1XQ.To compute the loadings of the yields with maturities from 1-, 2-, to 10-year, we use thefollowing commands:

K1XQ = diag([0.99, 0.96, 0.90]);

mat = (1:10);

dt = 1/12;

BnX = A0N__computeBnAn(mat/dt, K1XQ)/dt;

size(BnX)

ans =

10 3

The first two lines specify the desired maturities (in years) and the monthly interval. Im-portantly in the third line, the first argument is scaled such that the maturities vector isin multiple of a month – the interval unit of the model. Also note that since we are onlyinterested in the loadings BnX (which are not dependent on r0 or SSX), the last two inputs(r0 and SSX) are not required. Additionally, since dX is omitted, it is defaulted to be a vectorof ones.

It can be seen that the size of BnX is 10 by 3. Each row corresponds to one maturity,each column corresponds to one of the three state variables. In particular, remember thatthe output of this routine is for interest rate per unit of time interval, which in this case is amonth. The “/dt” annualizes the loadings, basically multiplying the loadings by 12.

Example 2

If we specify r0 as well as SSX, then the following command gives us loadings BnX, AnX, aswell as betan (where betan is the r0-free component of AnX such that AnX = r0 + betan).

r0 = 0.005;

SSX = eye(3);

[BnX, AnX, betan] = A0N__computeBnAn(mat/dt, K1XQ, [], r0, SSX);

>> [AnX, r0 + betan]

ans =

1.0e+03 *

-0.0456 -0.0456

32

-0.1476 -0.1476

-0.2748 -0.2748

-0.4158 -0.4158

-0.5648 -0.5648

-0.7179 -0.7179

-0.8725 -0.8725

-1.0266 -1.0266

-1.1788 -1.1788

-1.3280 -1.3280

Once again, dX is omitted thus it is defaulted to be a vector of ones. Also, the loadings arenot annualized. For loadings on per annual yields, we need to do:

BnX = BnX/dt;

AnX = AnX/dt;

betan = betan/dt;

With these loadings, the n-period bond price is given by:

Pn,t = exp(−n∆t(An,X +Bn,XXt)).

33

4.3 Matlab header

%% [BnX, AnX, betan] = A0N__computeBnAn(mat, K1XQ, dX, r0, SSX)

% computes loadings for a canonical A_0(N) model.

%

%

%% INPUTS:

% mat: [J x 1] vector of maturities. NOTE that the

% maturities are in multiples of the

% discrete interval used in the

% model NOT in multiple of years.

% K1XQ: [N x N] the risk neutral feedback matrix

% dX: [1 x N] state loadings for the one-period rate

% (if not provided, defaulted to be a vector of

% ones)

% r0: scalar the long run risk neutral mean of

% the short rate

% SSX: [N x N] the covariance matrix of the errors

%

%% OUTPUTS:

% BnX: [J x N] yield loadings

% AnX: [J x 1] intercepts

% betan: [J x 1] the part of the intercepts that is not

% related to the long run risk neutral

% mean r0.

34

5 QG computeCnBnAn.m: compute yield loadings

for a canonical quadratic gaussian term structure

model

[CnX, BnX, AnX] = QG__computeCnBnAn(mat, K1XQ, r0, SSX)

% computes loadings for a canonical quadratic gaussian term structure model

% that respects the ZLB.

5.1 A general setup for a ZLB-consistent quadratic gaussian termstructure model

Let’s consider the following N -factor model:

Xt+1 =KP0X +KP

1XXt + εPt+1, (1)

Xt+1 =KQ0X +KQ

1XXt + εQt+1, (2)

rt =(δX Xt + r0)2, (3)

where the innovations, εPt+1 and εQt+1, are conditional Gaussian under both the historicalmeasures P and risk-neutral measures Q with zero means and a constant covariance matrixgiven by ΣX .

5.2 Bond Pricing

The current model is a special case of the family of quadratic gaussian term structure models(QGTS). In the general case, the short rate takes the form: rt = X ′tδ2Xt + δ1Xt + δ0. Wechoose the simplified formulation in (3) so 1) it is clear that rt is bounded below by zero; and2) it is relatively straightforward to develop a canonical form of the model that parallels thatof JSZ.

Bond prices for a QGTS model are available in closed-form. Working with the generalformulation: rt = X ′tδ2Xt + δ1Xt + δ0, for each maturity n, one can write the n-period bondprice, Pn,t as:

Pn,t = e−An−BnXt−X′tCnXt (4)

where the loadings are obtained recursively:

Cn =δ2 −1

2KQ

1X

′Ωn−1K

Q1X , (5)

Bn =δ1 +Bn−1Sn−1Σ−1X KQ

1X −KQ0X

′Ωn−1K

Q1X , (6)

An =An−1 + δ0 −1

2Bn−1Sn−1B

′n−1 +

1

2log∣∣∣ 2Cn−1ΣX + IN

∣∣∣+Bn−1Sn−1Σ−1

X KQ0X −

1

2KQ

0X

′Ωn−1K

Q0X . (7)

35

where Sn−1 = (2Cn−1 + Σ−1X )−1 and Ωn−1 = Σ−1

X Sn−1Σ−1X − Σ−1

X . The recursions start fromA0 = 0, B0 = 0, and C0 = 0. Detailed derivations can be found at the end of the document.

5.3 Canonical Form

Following JSZ, we can shift the states such that Xt have zero means under the risk-neutralmeasures. We can also rotate the states such that the risk-neutral feedback matrix becomesdiagonal (or of some comparable Jordan form). In addition, we can scale the states in such away that the loading vector δX in the short rate pricing equation becomes a row vector ofones, denoted by ι. With this canonical form, our setup becomes:

Xt+1 =KP0X +KP

1XXt + εPt+1, (8)

Xt+1 =λQXt + εQt+1, (9)

rt =(ιXt + r0)2, (10)

where λQ is diagonal or of some comparable Jordan form. Similar to the JSZ setup, the fullset of parameters for our models are: Θ = (KP

0X , KP1X ,ΣX , λ

Q, r0). These parameters are fullyidentified econometrically.

Taking into account the above canonical form, and our particular short rate formulation,the bond pricing recursions are somewhat simplified:

Cn =ι′ι− 1

2λQ′Ωn−1λ

Q, (11)

Bn =2ιr0 +Bn−1Sn−1Σ−1X λQ, (12)

An =An−1 + r20 −

1

2Bn−1Sn−1B

′n−1 +

1

2log∣∣∣ 2Cn−1ΣX + IN

∣∣∣ , (13)

where Sn−1 = (2Cn−1 + Σ−1X )−1 and Ωn−1 = Σ−1

X Sn−1Σ−1X − Σ−1

X .Annualizing the loadings appropriately, we can write yield yn,t for each maturity n as:1

yn,t = An,X +Bn,XXt +X ′tCn,XXt. (14)

1In particular, for an annualized n-period yield yn,t, the n-period bond price is given by: Pn,t =exp(−yn,t × n∆t). This implies that An,X × n∆t = An, Bn,X × n∆t = Bn, and Cn,X × n∆t = Cn.

36

5.4 Bond Pricing - Derivations

The risk-neutral dynamics of states is obtained from equation (2):

Xt+1 = KQ0X +KQ

1XXt︸︷︷︸µQt

+eQt+1

where eQt+1 ∼ N(0,ΣX). We work with the general formulation of the short rate:

rt = δ0 + δ1Xt +X ′tδ2Xt.

Assuming that the n-period bond prices takes the form Pn,t = e−An−BnXt−X′tCnXt , we canobtain the recursions for An, Bn, and Cn from the Euler equation:

Pn,t =e−rtEQt [Pn−1,t+1],

=e−(δ0+δ1Xt+X′tδ2Xt)−An−1 EQt [e−(Bn−1Xt+1+X′t+1Cn−1Xt+1)]︸︷︷︸

Φ

. (15)

Evaluating Φ, we have:

Φ =1√

(2π)N |ΣX |

∫e−(Bn−1Xt+1+X′t+1Cn−1Xt+1)e−

12

(Xt+1−µQt )′Σ−1X (Xt+1−µQt )dXt+1, (16)

=1√

(2π)N |ΣX |

∫e−

12X′t+1(2Cn−1+Σ−1

X )Xt+1+(µQt′Σ−1

X −Bn−1)Xt+1− 12µQt′Σ−1

X µQt dXt+1 (17)

Let’s denote Sn−1 and mt such that:

Sn−1 =(2Cn−1 + Σ−1X )−1, (18)

m′tS−1n−1 =µQ

t

′Σ−1X −Bn−1. (19)

We now can write:

Φ =1√

(2π)N |ΣX |

∫e−

12X′t+1S

−1n−1Xt+1+m′tS

−1n−1Xt+1− 1

2µQt′Σ−1

X µQt dXt+1, (20)

=1√

(2π)N |ΣX |

∫e−

12

(Xt+1−mt)′S−1n−1(Xt+1−mt)dXt+1︸︷︷︸√

(2π)N |Sn−1|

e12mt′S−1

n−1mt− 12µQt′Σ−1

X µQt , (21)

=

√|Sn−1||ΣX |

e12mt′S−1

n−1mt− 12µQt′Σ−1

X µQt . (22)

Note that:

|Sn−1| = |(2Cn−1 + Σ−1X )−1| = |ΣX(2Cn−1ΣX + IN)−1| = |ΣX | × |(2Cn−1ΣX + IN)−1|,

37

thus

−log(Pn,t) =δ0 + δ1Xt +X ′tδ2Xt + An−1 − log(Φ)

=δ0 + δ1Xt +X ′tδ2Xt + An−1 +1

2log|2Cn−1ΣX + IN | −

(1

2mt′S−1n−1mt −

1

2µQt

′Σ−1X µQ

t

)(23)

From (18) and (19), we know that m′tS−1n−1mt = (µQ

t

′Σ−1X − Bn−1)Sn−1(Σ

−1X µQ

t − B′n−1).Therefore:

1


1

2µQt

′Σ−1X µQ

t =1

2µQt

′Ωn−1µ

Qt −Bn−1Sn−1Σ−1

X µQt +

1

2Bn−1Sn−1B

′n−1 (24)

where Ωn−1 = Σ−1X Sn−1Σ−1

X − Σ−1X . Substitute µQ

t = KQ0X +KQ

1XXt, we obtain:

1


1

2µQt

′Σ−1X µQ

t =1

2X ′tK

Q1X

′Ωn−1K

Q1XXt +KQ

0X

′Ωn−1K

Q1XXt

−Bn−1Sn−1Σ−1X KQ

1XXt +1

2KQ

0X

′Ωn−1K

Q0X

−Bn−1Sn−1Σ−1X KQ

0X +1

2Bn−1Sn−1B

′n−1. (25)

Substitute (25) in (23) and also the fact that −log(Pn,t) = An + BnXt + X ′tCnXt, wecan obtain the loadings for Cn by matching the quadratic terms (in Xt) from both sides ofequation (23):

Cn = δ2 −1

2KQ

1X

′Ωn−1K

Q1X .

Likewise, matching the linear terms from both sides of equation (23), we obtain:

Bn = δ1 +Bn−1Sn−1Σ−1X KQ

1X −KQ0X

′Ωn−1K

Q1X .

Finally, matching the intercepts, we obtain:

An = δ0+An−1+1

2log|2Cn−1ΣX+IN |−

1

2Bn−1Sn−1B

′n−1−

1

2KQ

0X

′Ωn−1K

Q0X+Bn−1Sn−1Σ−1

X KQ0X .

38

5.5 Examples

Suppose that we work with a QG term structure model with monthly intervals (∆t = 1/12).To compute the loadings of the yields with maturities from 1-, 2-, to 10-year as a function ofK1XQ, r0, SSX, we use the following commands:

K1XQ = diag([0.99, 0.96, 0.90]);

r0 = 0.01;

SSX = eye(3);

mat = (1:10);

dt = 1/12;

[CnX, BnX, AnX] = QG__computeCnBnAn(round(mat/dt), K1XQ, r0, SSX);

size(CnX)

size(BnX)

size(AnX)

To annualize the loadings, we do:

CnX = CnX/dt;

BnX = BnX/dt;

AnX = AnX/dt;

With these loadings, the n-period bond price is given by:

Pn,t = exp(−n∆t(An,X +Bn,XXt +X ′tCn,XXt)).

39

5.6 Matlab header

%% [CnX, BnX, AnX] = QG__computeCnBnAn(mat, K1XQ, r0, SSX)

% loadings for a canonical quadratic gaussian term structure model

%

% y_t = AnX + BnX*X_t + X_t’CnX*X_t

%

% The canonical model is as follows:

% rt = (i’X_t + r0)^2

% X_t+1 = K1XQ*X_t + N(0, SSX)

% where K1XQ is of the Jordan form

%

%

%% INPUTS:

% mat: [J x 1] vector of maturities. NOTE that the

% maturities are in multiples of the

% discrete interval used in the

% model, NOT in multiple of years.

% K1XQ: [N x N] the risk neutral feedback matrix

% r0: scalar the long run risk neutral mean of

% the short rate

% SSX: [N x N] the covariance matrix of the errors

%

%% OUTPUTS:

% CnX: [N x N x J] yield loadings: quadratic term

% BnX: [J x N] yield loadings: linear term

% AnX: [J x 1] intercepts

40

6 FMN Rotate.m: rotate an affine/quadratic-gaussian

term structure model

y1 = FMN__Rotate(y0, U1, U0)

% rotates an affine/quadratic-gaussian TS model where the states dynamics

% are affine under both P and Q, from a model with Z as states to one with

% S = U0 + U1*Z as states.

6.1 Rotation mechanics – state dynamics

Recall that each affine model is described by:

Zt+1 = N(K0 +K1 Zt,Σ +M∑i=1

Σi Vi,t)

therefore, when we replace Z by S = U0 + U1Z, we have a new model in terms of S:

St+1 = N

((U0 + U1K0 − U1K1U

−11 U0) + (U1K1U

−11 )St, (U1ΣU ′1) +

M∑i=1

(U1ΣiU′1)Vi,t

).

If y0 contains the affine models both under P and Q, then this rotation will be applied toeach model separately.

6.2 Rotation mechanics – bond pricing

Additionally, if y0 also contains the yield loadings:

Yt =A+BZt + Z ′tCZt

=A+BU−11 (St − U0) + (St − U0)′ U−1

1′CU−1

1︸︷︷︸CS

(St − U0)

=A−BU−11 U0 + U ′0CSU0︸︷︷︸

AS

+ (BU−11 − 2U ′0CS)︸︷︷︸

BS

St + S ′tCSSt.

41

6.3 Example - affine rotation under Q

Suppose that we work with a term structure model with monthly intervals (∆t = 1/12), andthe risk-neutral dynamics are characterized by the following parameters:

dt = 1/12;

y0=[];

y0.Q.K0 = zeros(3,1);

y0.Q.K1 = diag([0.99, 0.96, 0.90]);

y0.Q.SS = eye(3);

r0 = 0.005;

We can compute bond yield loadings for maturities 1-yr, 2-yr, ..., 10-yr, using theA0N computeBnAn.m routine:

mat = (1:10);

K1XQ = y0.Q.K1;

SSX = y0.Q.SS;

[BnX, AnX] = A0N__computeBnAn(round(mat/dt), K1XQ, [], r0, SSX);

y0.B = BnX/dt;

y0.A = AnX/dt;

Let’s say we would like to rotate our model to one where Pt = W yt is the pricing factor.For simplicity, we choose W such that Pt correspond to a vector of the 1-yr, 5-yr, and 10-yryields stacked together.

W = zeros(3,10);

W(1,1) = 1; % choose the 1-yr yield as the first factor

W(2,5) = 1; % choose the 5-year yield as the second factor

W(3,10) = 1; % choose the 10-year yield as the third factor

Because Pt = Wyt = W (AX + BX Xt) = W AX + (W BX)Xt, we can rotate our modelfrom Xt to Pt by:

U1 = W*y0.B;

U0 = W*y0.A;

y1 = FMN__Rotate(y0, U1, U0);

The Q dynamics of Pt now is a VAR(1) characterized by:

disp(’y1.Q = ’); disp(y1.Q);

y1.Q =

K1: [33 double]

K0: [31 double]

SS: [33 double]

42

The yield loadings in terms of Pt are now given by y1.A and y1.B

disp(’y1 = ’); disp(y1);

y1 =

Q: [11 struct]

B: [103 double]

A: [101 double]

43

6.4 Example - affine rotation under both P and Q

If, additionally, we also specify the P dynamics of the latent state such as:

y0.P.K0 = randn(3,1);

y0.P.K1 = randn(3);

y0.P.SS = y0.Q.SS;

then we can rotate our model from Xt to Pt again by:

y1 = FMN__Rotate(y0, U1, U0);

In this case, the P dynamics of Pt now is also a VAR(1) characterized by:

disp(’y1.P = ’); disp(y1.P);

y1.P =

K1: [33 double]

K0: [31 double]

SS: [33 double]

44

6.5 Matlab header

%% y1 = FMN__Rotate(y0, U1, U0)

% rotates an affine/quadratic-gaussian TS model where the states dynamics

% are affine under both P and Q, from a model with Z as states to one with

% S = U0 + U1*Z as states.

%

% Each affine/QG model is characterized by the following inputs organized in a struct variable:

% .K0: [N x 1] the intercepts

% .K1: [N x N*p] the feedback matrix

% .SS: [N x N*(M+1)] the volatility matrices

%

% To be specific, the state Z follow the dynamics:

% Z_t = N(K0 + K1*[Z_t-1; Z_t-2; ...],

% SSi(:,:,1) + sum_i=1^M SSi(:,:,i+1)*V_i,t)

% where SSi = reshape(SS, [N, N, M+1]);

%

%% INPUTS:

% y0: struct a record of an affine model as described above.

% Alternatively, y0 can take the following form:

% .P: contains the affine model under the P measures

% .Q: contains the affine model under the Q measures

% .A and .B and .C: contain the yield loadings

% Y = A + B*Z + Z’*C*Z

% A: Jx1

% B: JxN

% C: NxNxJ

% The size of the states can be different

% under P and Q (because some state variables may be unspanned).

% U1: [N x N]

% U0: [N x 1] optional (defaulted to be zeros.)

%

%% OUTPUTS:

% y1: struct output record after transformation, the structure

% parallels that of y0.

45

7 Reg K1Q.m: estimate the KQ1 matrix through

linear regressions

%% K1Q = Reg__K1Q(R, mat, Z, dt, type)

% uses linear regressions to estimate the risk-neutral feedbak matrix K1Q

7.1 A regression-based algorithm to estimate KQ1

By no-arb, we must have:

Pn,t = e−yh,t∆EQt [Pn−h,t+h] (26)

or equivalently,

e−yn,tn∆ = e−yh,t∆EQt [e−yn−h,t+h(n−h)∆]. (27)

Substitute yn,t by An +BnXt, and match the linear terms, we can write:

nBn − hBh = (n− h)Bn−h(KQ1 )h, (28)

which implies

Bn − h/nBh = (1− h/n)Bn−h(KQ1 )h. (29)

This suggests that we could regress: Bn − hnBh on (1− h/n)Bn−h to obtain an estimate

of (KQ1 )h and thus KQ

1 .

7.2 Steps to implementing the regression-based algorithm

1. Let’s denote by m and M the minimum and maximum maturities, in terms of the timeinterval ∆, of yields included in the estimation. We first interpolate the input yieldmatrix R so that it covers all maturities that are in multiples of ∆ from m∆ to M∆.

2. Next we regress this interpolated yields onto the pricing factor Z (inputted) to obtainBn with n ranging from m to M .

3. We choose h to be the same as m so that Bh is obtained without the need of extrapolationbeyond the current maturity range.

4. Construct the left hand side by stacking Bn − hBh/n with n = 2m to n = M . Thisgives a (M − 2m+ 1)×N matrix.

5. Construct the right hand side by stacking (1− h/n)Bn−h with n = 2m to n = M . Thisgives another (M − 2m+ 1)×N matrix.

6. Regress left onto right hand side and then take root 1/h to obtain an estimate of KQ1 .

7. Convert KQ1 to a Jordan form, if requested.

46

7.3 Example 1 - Obtain an estimate of K1Q using yields-only fac-tors

First, we obtain yields data and construct yield PCs for a three factor model:

% Get yields data

sample = [197201; 200312];

AllR=GetData__TreasuryYields(sample, (1:40)/4, ’UFB’); % 40 x T

% construct yields PC

mat = [0.25; 0.5; (1:10)’];

R = AllR(round(4*mat), :);

N = 3;

dt = 1/12;

W = pca(R’, ’cov’);

W = W(1:N,:)*100;

PC = W*R;

Next, we can obtain an estimate of the KQ1 matrix, normalized to be of a Jordan form, by:

K1Q = Reg__K1Q(R, mat, PC, dt, ’Jordan’);

disp(’K1Q = ’); disp(K1Q);

K1Q =

0.9992 0 0

1.0000 0.9275 -0.0016

0 1.0000 0.9275

7.4 Example 2 - Obtain an estimate of K1Q using yields-only andmacro factors

We can also use the Reg K1Q code for non-yields pricing factors. For example, obtain amixed yields-macro pricing factors by:

% get macro data:

INFPC= GetData__AngPiazzesiJME2003(sample, ’INFPC’);

REALPC= GetData__AngPiazzesiJME2003(sample, ’REALPC’);

Z = [PC(1,:); INFPC; REALPC];

Next, we can obtain an estimate of the KQ1 matrix, normalized to be of a Jordan form, by:

% obtain an estimate of K1Q

K1Q = Reg__K1Q(R, mat, Z, dt, ’Jordan’);

disp(’K1Q = ’); disp(K1Q);

K1Q =

47

1.0183 0 0

1.0000 0.9755 0.0002

0 1.0000 0.9755

48

7.5 Matlab header

%% K1Q = Reg__K1Q(R, mat, Z, dt, type)

% uses Lee’s regression to estimate the risk-neutral feedbak matrix K1Q

%

%% INPUTS:

% R: [J x T] matrix of yields used in estimation

% mat: [J x 1] vector of maturities (in years) of yields used in

% estimation

% Z: [N x T] pricing factors (can be yields-based or non-yields/macro variables)

% dt: scalar time interval

% type: text ’Jordan’ -> K1Q will be of the Jordan type

% ’raw’ --> no adjustment will be made

%% OUTPUTS:

% K1Q: [N x N] risk neutral feedback matrix K1Q.

49

8 Reg OLS.m: run an OLS regression

%% [B, hB] = Reg__OLS(Y, X, R2)

% run an OLS regression of Y on X.

8.1 Matlab header

%% [B, hB] = Reg__OLS(Y, X, R2)

% run OLS regression of Y on X.

%

%% INPUTS

% Y: [M x T] left hand side variables

% X: [N x T] regressors

% R2: text If = ’AdjR2’ -> produces AdjR2.

% If = ’AIC’ -> produces AIC.

% If = ’BIC’ -> produces BIC.

% If = ’residuals’ -> produces residuals.

% If = ’fitted’ -> produces fitted.

% If = ’demean’ -> demean the series first

% If = ’include constant’ -> include a row of ones in

% X: [ones(1,T); X]

% Defaulted to be blank.

%% OUTPUTS

% B: [M x N] coefficient matrix

% hB: [M*N x T] moments conditions such that

% vec(B - true(B)) ~ E_T[hB]

%

50

9 Reg OLSconstrained.m: run an OLS regression

with equality constraints imposed on the coefficient

matrix

%% B = Reg__OLSconstrained(Y, X, Bcon, G)

% run an OLS regression of Y on X with the constraint that

% B = Bcon

% for all non-nan entries of Bcon

9.1 Objective function and analytical estimate

Consider a regression ofYt = BXt + et

where we want to fix some of the entries of B. For simplicity, let’s assume we want to fixsome of the entries of B to zeros. (The code can take non-zero fixed values though.)

We obtain an estimate of B by minimizing the following least square objective:

L =∑t

(Yt −BXt)′G−1(Yt −BXt)

where G is some psd weighting matrix.Taking the first order derivative w.r.t. B, we obtain:

1

2

∂L∂B

=∑t

G−1(Yt −BXt)X′t.

This implies

1

2Tvec(

∂L∂B

) = vec(G−1ET [YtX′t])︸︷︷︸

Γ0

−ET [XtX′t]⊗G−1︸︷︷︸

Γ1

vec(B).

Let id indexes the active elements of vec(B), the f.o.c. requires:

Γ0(id)− Γ1(id, id)vec(B)(id) = 0

which leads to:vec(B)(id) = Γ1(id, id)−1Γ0(id).

51

9.2 Examples

% Generate some random data

T = 500;

Y = randn(2,T);

X = randn(3,T);

% specify the constraint

Bcon = nan(2,3);

Bcon(1,1) = 1; % fix the (1,1) entry at a value of 1

Bcon(2,3) = 7; % fix the (2,3) entry at a value of 7

% run the regression

B = Reg__OLSconstrained(Y, X, Bcon);

disp(’B=’);disp(B);

B=

1.0000 -0.0433 -0.0350

-0.1418 0.0113 7.0000

52

9.3 Matlab header

%% B = Reg__OLSconstrained(Y, X, Bcon, G)

% run an OLS regression of Y on X with the constraint that

% B = Bcon

% for all non-nan entries of Bcon

%

% Estimate of B is obtained by minimizing the objective:

% \sum_t (Y_t-B*X_t)’*G^-1*(Y_t-B*X_t)

% subject to the constraint that B = Bcon for all non-nan entries of Bcon

%

%% INPUTS



% Bcon: [M x N] constraints matrix

% if Bcon(i,j) = nan --> B(i,j) is a free

% parameter

% G: [M x M] weighting matrix (psd)

% defaulted to be identity

%

%% OUTPUTS

% B: [M x N] coefficient matrix

53

10 Reg RRR.m: run an OLS regression with rank

constraint imposed on the coefficient matrix

%% [beta, A, B] = Reg__RRR(Y, X, r, G)

% Run a reduced-rank regression of Y on X

% Y = beta*X + noise

% with the constraint that the coefficient matrix beta be of rank r.

10.1 Objective function and analytical estimate

Consider a regression ofYt = βXt + et

where we want to restrict the rank of β to r.We obtain an estimate of beta by minimizing the following least square objective:

L =∑t

(Yt − β Xt)′G−1(Yt − β Xt)

where G is some psd weighting matrix.Let βOLS denote the unconstrained OLS estimate obtained by regressing Yt on Xt. Key

to obtaining the analytical reduced-rank estimate of β is to see that:

L =∑t

(Yt − βOLS Xt)′G−1(Yt − βOLS Xt) +

∑t

(βOLS Xt − βXt)′G−1(βOLS Xt − βXt),

and that the first component (∑

t(Yt − βOLS Xt)′G−1(Yt − βOLS Xt)) has nothing to do with

β.Thus, estimating β boils down to minimizing:∑

t

(βOLS Xt − βXt)′G−1(βOLS Xt − βXt).

When G is an identity matrix, this objective simplifies to minimizing:∑t

(βOLS Xt − βXt)′(βOLS Xt − βXt).

This is a typical PCA problem. When imposing a rank r on β, with this objective, webasically look for the first r pricincipal components of βOLSXt. Thus, beta must correspond tothe first r eigenvectors of the matrix βOLSET [XtX

′t]β′OLS, denoted by Vr. With some algebra,

we can show that:β = A×B

whereA = Vr and B = V ′rβOLS.

54

When G is not an identity matrix, the adjustment is straightforward:

A = G1/2Vr and B = V ′rG−1/2βOLS

where Vr denotes the first r eigenvectors of the matrix G−1/2βOLSET [XtX′t]β′OLSG

−1/2. (Notethat the square-root matrix, G1/2, must be psd.)

55

10.2 Examples

% Generate some random data

T = 500;

Y = randn(2,T);

X = randn(3,T);

% run the regression

r = 1;

[beta, A, B] = Reg__RRR(Y, X, r);

% display estimates:

disp(’B=’); disp(B);

disp(’A=’); disp(A);

disp(’beta - A*B’); disp(beta - A*B);

B=

-0.0389 -0.0741 -0.0420

A=

-0.1184

0.9930

beta - A*B

0 0 0

0 0 0

56

10.3 Matlab header

%% [beta, A, B] = Reg__RRR(Y, X, r, G)

% Run a reduced-rank regression of Y on X

% Y = beta*X + noise

% with the constraint that the coefficient matrix beta be of rank r.

%

% Estimate of beta is obtained by minimizing the objective:

% \sum_t (Y_t-beta*X_t)’*G^-1*(Y_t-beta*X_t)’

% where G is positive definite and beta of rank r.

%

%% INPUTS



% r: scalar rank (must be smaller than or equal to min(M,N))

% G: [M x M] weighting matrix (psd)

% defaulted to be identity

%

%% OUTPUTS

% beta: [M x N] coefficient matrix

% A: [M x r]

% B: [r x N] such that beta = A*B

57

11 Reg GLS.m: run a GLS regression

%% [K, llk, hK] = Reg__GLS(Y, X, S, invS, S_logabsdet)

% Run a generalized least square regression

% by solving

% Min \sum_t (Y_t - K X_t)’S_t^-1(Y_t - K X_t)

11.1 Analytical estimates

Let:L =

∑t

(Yt −KXt)′S−1t (Yt −KXt).

Take the first order derivative w.r.t. K, we obtain:

∂L∂K

= 2∑t

S−1t (Yt −KXt)X

′t.

Vectorize both sides, we obtain:

1

2vec(

∂L∂K

) = vec(ET [S−1t YtX

′t])− ET [XtX

′t ⊗ S−1

t ]vec(K).

Solving the f.o.c., we obtain:

vec(K) = ET [XtX′t ⊗ S−1

t ]−1vec(ET [S−1t YtX

′t]).

58

11.2 Matlab header

%% [K, llk, hK] = Reg__GLS(Y, X, S, invS, S_logabsdet)

% Run a generalized least square regression

% by solving

% Min \sum_t (Y_t - K X_t)’S_t^-1(Y_t - K X_t)

%

%% INPUTS:

% Y: N x T: LHS variables

% X: M x T regressors

% S: N x N x T errors variance matrices

% invS: N x N x T inversion of errors variance matrices

% (optional). If these are provided -> save time

% inverting the matrices in S.

% S_logabsdet:

% 1 x T vector of log(abs(det(S_t))). This is also optional

% and only needed if the output llk (likelihood

% score) is requested.

%

%% OUTPUTS:

% K: [N x M] matrix of coefficients

% llk: [1 x T] likelihood score:

% -0.5 (Y_t - K X_t)’S_t^-1(Y_t - K X_t)

% -0.5*N*log(2*pi()) -0.5*log(abs(det(S_t)))

% hK: [NM x T] moment conditions related to estimates K such

% that: vec(K - true(K)) ~ ET[hK].

%

59

12 Gaussian.m: computes the Gaussian density

%% y = Gaussian(res, SS, invSS, logabsdetSS)

% computes the gaussian density given the matrix of residuals and the

% covariance matrix (or array of covariance matrices).

12.1 Formula

For a time series et N(0,Σt), the log likelihood vector is given as

yt = −1

2e′tΣ

−1t et −

1

2N log(2π)− 1

2log|Σt|

where N is the dimensionality of et.

12.2 Matlab header

%% y = Gaussian(res, SS, invSS, logabsdetSS)

% computes the gaussian density given the matrix of residuals and the

% covariance matrix (or array of covariance matrices).

%

%

%% INPUTS:

% res: [N x T] matrix of residuals

% SS: [N x N] or

% [N x N x T] covariance matrice or array of covariance

% matrices

% invSS: [N x N] or

% [N x N x T] optional inverse of SS

% logabsdetSS:[1 x T] optional log(abs(|SS|))

%

%% OUTPUTS:

% y: [1 x T] vector of density

60

13 Kalman linear.m: computes the density for a

linear Kalman filtering problem

%% [y, Zf] = Kalman__linear(D, K0, K1, A, B, SZZ, SDD, M0, S0, mex)

% computes the Kalman density for the following system:

% state equation:

% Z_t+1 = K0 + K1*Z_t + eZ_t+1

% observation equation:

% D_t+1 = A + B*Z_t+1 + eD_t+1

% where:

% eZ ~IID N(0, SZZ)

% eD ~IID N(0, SDD)

13.1 The setup

Consider the following standard filtering setup:

Zt+1 =K0 +K1Zt + εt+1, (30)

Dot =A+B Zt + et. (31)

where εt+1 ∼ N(0,ΣZ) and et ∼ N(0,ΣD).We know that the underlying states, Zt, follow a first order VAR. However, Zt is latent –

we don’t observe Zt directly. What we do observe is the observed data series Dot which we

know is linearly linked to Zt (A+BZt) plus some observational noise (et).What we want to do is to compute the model-implied (log) likelihood of seeing what we

see:L =

∑t

f(Dot |It−1)

where It denotes the information set generated by the observed data up to time t: Dot , D

ot−1, ..., D

o2, D

o1.

The likelihood can be computed using the standard Kalman approach which we willsummarize quickly below. We adopt the common notations:

Mt|t = E[Zt|It], and Σt|t = Cov[Zt|It], (32)

Mt|t−1 = E[Zt|It−1], and Σt|t−1 = Cov[Zt|It−1]. (33)

We refer to Mt|t−1 and Σt|t−1 as the one-step ahead moments, and Mt|t and Σt|t as the filteredmoments of the states.

13.2 Likelihood computation

First, it is important to note that once we know the one step ahead conditional moments ofthe states, Mt|t−1 and Σt|t−1, it is straightforward to compute conditional moments of the

61

observed yields. In particular, it is clear that:

E[Dot |It−1] =A+BMt|t−1, and (34)

cov[Dot |It−1] =BΣt|t−1B

′ + ΣD. (35)

How do we compute Mt|t−1 and Σt|t−1 though? It is important to also note that if we knowthe filtered moments of the states at time t− 1: Mt−1|t−1 and Σt−1|t−1, it is straightforwardto compute Mt|t−1 and Σt|t−1. In particular, using (30), it is easy to see that:

Mt|t−1 = E[K0 +K1Zt−1|It−1] = K0 +K1Mt−1|t−1, (36)

Σt|t−1 = cov[K0 +K1Zt−1|It−1] + ΣZ = K1Σt−1|t−1K1′ + ΣZ . (37)

So far, we have seen that if we know Mt−1|t−1 and Σt−1|t−1, then we can evaluate Mt|t−1

and Σt|t−1 and E[Dot |It−1] and cov[Do

t |It−1]. It turns out that as soon as we know the one-stepahead moments, there are also formulas we can use to obtain the filtered moments at time t:Mt|t and Σt|t as a function of the one-step ahead moments: Mt|t−1, Σt|t−1, E[Do

t |It−1], andcov[Do

t |It−1].We will provide these formulas as well as the intuition behind them shortly. For now,

taking these formulas as given, it is important to see that starting from some uninformedmoments M1|0 and Σ1|0 at time t = 1, we can recursively obtain the entire time series ofMt|t−1, Σt|t−1, E[Do

t |It−1], cov[Dot |It−1], Mt|t, and Σt|t for all t’s. Within each iteration, we

execute three steps:

1. Based on Mt|t−1, Σt|t−1, we compute E[Dot |It−1], cov[Do

t |It−1] using equations (34)-(35).These moments allow us to compute the log likelihood of the observed data at time t.

2. Based on Mt|t−1, Σt|t−1, and E[Dot |It−1], cov[Do

t |It−1] obtained from step 1, we computeMt|t, Σt|t using the formulas that we will provide and explain below.

3. Based on Mt|t, Σt|t, we compute Mt+1|t, Σt+1|t using equations (36) and (37). This stepcompletes the recursion because it allows us to reiterate step 1 at the next point intime t+ 1.

How do we evaluate Mt|t, Σt|t, based on the knowledge of Mt|t−1, Σt|t−1, and E[Dot |It−1],

cov[Dot |It−1]? In other words, if we know the best guess for Zt based on our information set up

to time t− 1, Mt|t−1, and now we observe some extra information, Dot , how do we incorporate

this new information to update our best guess: Mt|t. Likewise, if we know the amount ofuncertainty surrounding Zt based on our information set up to time t− 1 is given by Σt|t−1,now observing some extra information, Do

t , ought to reduce this amount of uncertainty. Howdo we measure the reduction in uncertainty induced by the knowledge of Do

t to arrive at Σt|t?It is instructive at this point to recall that for generic random variables X and Y , we can

obtain conditional moments E[Y |X] and cov(Y |X) through a linear regression of Y on X.In particular, the expected component of the regression of Y on X gives us E[Y |X] and thevariance of the regression residuals gives us cov(Y |X). More specifically, we can write:

E[Y |X] =E[Y ] + cov(Y,X)var(X)−1(X − E[X]), (38)

cov(Y |X) =cov(Y )− cov(Y,X)var(X)−1cov(X, Y ). (39)

62

Section 13.3 shows how we can arrive at these two equations from a regression of Y on X.Applying the above regression framework to our setting, in order to incorporate the extra

information, Dot to update our best guess of Zt, we simply need to regress Zt on Do

t :

E[Zt|Dot ] = E[Zt] + cov(Zt, D

ot )var(D

ot )−1(Do

t − E[Dot ]).

To incorporate the fact that the new information, Dot , is added on to It−1, as supposed to

an empty prior, we need to replace all the moments in the above equation by conditionalmoments relative to It−1. We obtain:

E[Zt|Dot , It−1]︸︷︷︸

Mt|t

= E[Zt|It−1]︸︷︷︸Mt|t−1

+ cov(Zt, Dot |It−1)︸︷︷︸

Σt|t−1B′

var(Dot |It−1)−1(Do

t − E[Dot |It−1]). (40)

Mt|t consist of two components. The first component, Mt|t−1, gives us our best guess of Zt evenwithout any knowledge of Do

t . The second component, cov(Zt, Dot |It−1)var(Do

t |It−1)−1(Dot −

E[Dot |It−1]), captures the information about Zt that we gain based on our knowledge of Do

t .This second component consists of three sub-components each of which is quite intuitive.cov(Zt, D

ot |It−1) says that the more correlated Zt and Do

t are the more we learn about Zt ifwe know Do

t . var(Dot |It−1)

−1 says that the incremental information gain depends on howaccurate the signals coming from Do

t are. Noisy signals (big variances) will, intuitively,diminish the information content. Finally, the subcomponent (Do

t − E[Dot |It−1]) says that

the more dramatic the Dot -signal (the more Do

t deviates from its mean), the more it informsus about Zt.

The loading of Dot in (40) is often referred to as the Kalman gain and denote by Kt. Using

this notation, we write:

Mt|t = Mt|t−1 +Kt(Dot − E[Do

t |It−1]) (41)

where the Kalman gain Kt = Σt|t−1B′var(Do

t |It−1)−1. The Kalman gain coefficient determinesthe amount of incremental information we learn about the underlying states for each set ofobserved data.

Using the same regression framework, we can also write:

cov[Zt|Dot , It−1]︸︷︷︸

Σt|t

= cov[Zt|It−1]︸︷︷︸Σt|t−1

− cov(Zt, Dot |It−1)︸︷︷︸

Σt|t−1B′

var(Dot |It−1)−1 cov(Do

t , Zt|It−1)︸︷︷︸BΣt|t−1

. (42)

or using the Kalman gain notation:

Σt|t = Σt|t−1 −Ktvar(Dot |It−1)K ′t. (43)

This equations are quite intuitive. They say that the remaining uncertainty about Zt afterobserving Do

t (cov(Zt|It)) must be equal to the amount of uncertainty before observing Dot

(cov(Zt|It−1)) minus the amount uncertainty eliminated thanks to the knowledge of Dot .

To sum up, starting with a pair of uninformed moments (M1|0, Σ1|0), we can implementour recursions as follows:

63

1. Based on Mt|t−1, Σt|t−1, we compute E[Dot |It−1], cov[Do

t |It−1]:

E[Dot |It−1] =A+BMt|t−1, and (44)

var[Dot |It−1] =BΣt|t−1B

′ + ΣD. (45)

2. ... and Mt|t, Σt|t:

Mt|t =Mt|t−1 +Kt(Dot − E[Do

t |It−1]), (46)

Σt|t =Σt|t−1 −Ktvar(Dot |It−1)K ′t, (47)

where the Kalman gain Kt = Σt|t−1B′var(Do

t |It−1)−1.

3. Based on Mt|t, Σt|t, we compute Mt+1|t, Σt+1|t:

Mt+1|t =K0 +K1Mt|t,

Σt+1|t =K1Σt|tK1′ + ΣZ .

13.3 Conditional moments obtained through linear regressions

Consider a linear regression of a generic random variable Y on a generic random variable X:

Y = α + βX + noise.

In running this regression, we choose the coefficients α and β in such a way that the linearexpression, α + βX, gives us the best guess of Y given our knowledge of X, or E[Y |X]. Inother words, with this regression, we try to estimate E[Y |X] by a linear model. Plugging theOLS estimates of α and β in the above regression, we obtain:

E[Y |X] = E[Y ] + cov(Y,X)var(X)−1(X − E[X]). (48)

E[Y |X] consist of two components. The first component, E[Y ], gives us our best guess ofY even without any knowledge of X. The second component, cov(Y,X)var(X)−1(X−E[X]),captures the information about Y that we gain based on our knowledge of X. This secondcomponent consists of three sub-components each of which is quite intuitive. cov(Y,X) saysthat the more correlated Y and X are the more we learn about Y if we know X. var(X)−1

says that the incremental information gain depends on how accurate the signals coming fromX are. Noisy signals (big var(X)) will, intuitively, diminish the information content. Finally,the subcomponent X −E[X] says that the more dramatic the X-signal (the more X deviatesfrom its unconditional average), the more it informs us about Y .

In addition to conditional first moment E[Y |X], we can also obtain the conditional secondmoment cov(Y |X) from the same regression. To see this, start with the identity:

cov(Y ) = cov(E[Y |X]) + cov(Y − E[Y |X]).

64

This identity holds because by construction Y −E[Y |X] is orthogonal to E[Y |X]. Note thatcov(Y − E[Y |X]) = cov(Y |X), thus:

cov(Y |X) = cov(Y )− cov(E[Y |X]).

This equation is intuitive. It says that the remaining uncertainty about Y after observingX (cov(Y |X)) must be equal to the amount of uncertainty before observing X (cov(Y ))minus the amount uncertainty eliminated thanks to the knowledge of X (cov(E[Y |X]). Usingequation (48), we can write:

cov(Y |X) = cov(Y )− cov(Y,X)var(X)−1cov(X, Y ). (49)

13.4 Speed issues

Implementing the above Kalman recursions in Matlab involves a loop which can be inefficient.One way to improve the computing speed is to translate the matlab recrusions to c/cpp andthen compile it into a mex file to use in Matlab.

The corresponding cpp code (Kalman linear cpp.cpp) is also provided in the same folder.The compiled mex files are also provided for a mac 64 and Windows 64 operating system. Ifthese mex files don’t work, the cpp files may need to be recompiled.

To compile the cpp script to a mex file, we need the eigen package (for matrices andvectors) which is installed in the following folder in my macbook:

/Users/anhle/Dropbox/Library_AGmatlab/eigen

To point matlab to this folder, we need to use the option “-I” together with the mexcommand in matlab:

mex -I/Users/anhle/Dropbox/Library_AGmatlab/eigen Kalman_linear_cpp.cpp

Note that there is no space between the “-I” and the path containing the eigen package.In windows machines, the command will look like:

mex -I"C:\Users\anhle\Dropbox\Library_AGmatlab\eigen" Kalman_linear_cpp.cpp

65

13.5 Examples

The following example illustrates a simple use of the code. It shows that the mex file allowsthe code runs about 10 times faster. This improvement can be machine dependent but hoversbetween 5-15 times in speed improvement.

N = 3; J = 5; T = 500;

D = randn(J, T);

K0 = randn(N,1);

K1 = 0.9*eye(N);

A = randn(J,1);

B = randn(J,N);

SZZ = eye(N);

SDD = 0.1*eye(J);

M0 = [];

S0 = [];

tic; [y, Zf] = Kalman__linear(D, K0, K1, A, B, SZZ, SDD, M0, S0); toc

tic; [y_mex, Zf_mex] = Kalman__linear(D, K0, K1, A, B, SZZ, SDD, M0, S0, ’mex’); toc



66

13.6 Matlab header

%% [y, Zf] = Kalman__linear(D, K0, K1, A, B, SZZ, SDD, M0, S0, mex)

% computes the Kalman density for the following system:

% state equation:

% Z_t+1 = K0 + K1*Z_t + eZ_t+1

% observation equation:

% D_t+1 = A + B*Z_t+1 + eD_t+1

% where:

% eZ ~IID N(0, SZZ)

% eD ~IID N(0, SDD)

%

%% INPUTS:

% D: [J x T] matrix of observed data

% K0: [N x 1] intercepts in the state equation

% K1: [N x N] feedback matrix

% A: [J x 1] intercepts in the observation equation

% B: [J x N] loadings in the observation equation

% SZZ: [N x N] covariance matrix of the state errors

% SDD: [J x J] covariance matrix of the observation errors

% M0: [N x 1] uninformative mean of Z_1: E[Z_1]

% S0: [N x N] uninformative covariance of Z_1 cov[Z_1]

% mex: text can be either:

% + ’mex’: use the mex file for speed

% improvement

% + ’’: (default) do not use mex

%% OUTPUTS:

% y: [1 x T] density of the data f(D_t|D_t-1, D_t-2,

% ...)

% Zf: [N x T] filtered estimates of the states: E[Z_t|D_t,

% D_t-1, ...]

67

14 mult prod.m: efficiently multiplies arrays of ma-

trices

%% d = mult__prod(a,b,c)

% efficiently computes matrix products for arrays a and b and c:

% d(:,:,i) = a(:,:,i)*b(:,:,i)*c(:,:,i).

% (efficiently = without using an inefficient loop)

14.1 Examples

Example 1 - Simple use, multiplication of 2 terms

a = randn(2,2,2);

b = randn(2,1,2);

p = mult__prod(a,b);

Check the results:

a(:,:,1)*b(:,:,1) - p(:,:,1)

a(:,:,2)*b(:,:,2) - p(:,:,2)

ans =

1.0e-16 *

0

0.5551

ans =

0

0

Example 2 - Simple use, multiplication of 3 terms

Note that b in this example is basically a scalar in the first two dimensions:

a = randn(2,2,2,3);

b = randn(1,1,2,3);

c = randn(2,3,2,3);

p = mult__prod(a,b,c);

>> size(p)

ans =

68

2 3 2 3

Example 3 - comparison vs loops

a = randn(2,2,200);

b = randn(1,1,200);

c = randn(2,3,200);

using multiprod

tic; for i=1:100; p = mult__prod(a,b,c); end; toc

% using loop:

tic;

for i=1:100

for t=1:200

p(:,:,t) = a(:,:,t)*b(:,:,t)*c(:,:,t);

end

end

toc



69

14.2 Matlab header

%% d = mult__prod(a,b,c)

% efficiently computes matrix product for arrays a and b and c:

% d(:,:,i) = a(:,:,i)*b(:,:,i)*c(:,:,i).

% (efficiently = without using an inefficient loop)

%

%% INPUTS:

% a: [M x N x T]

% b: [N x K x T]

% c: [K x P x T]

%

%% OUTPUTS:

% d: [M x P x T]

%

% NOTE: T can be multi-dimensional.

70

15 mult inv.m: efficiently inverts arrays of matrices

%% [inva, logabsdd] = mult__inv(a, whichoutput)

% inverts an array of matrices so that

% inva(:,:,i) = inv(a(:,:,i))

15.1 Examples

Check accuracy

We first check if the results are accurate.

a = randn(2,2,7);

ainv = mult__inv(a);

chainv = nan(size(ainv));

for k=1:7

chainv(:,:,k) = inv(a(:,:,k));

end

% check for accuracy:

dd = nan(1,7);

for k=1:7

d = chainv(:,:,k) - ainv(:,:,k);

dd(k) = max(abs(d(:)));

end

disp(dd)

1.0e-15 *

Columns 1 through 5

0.1110 0.0555 0.1110 0.1110 0.8882

Columns 6 through 7

0.4441 0.1110

Check speed

Now we check if mult inv.m results in a significant speed improvement.

for j=1:1000

71

a = randn(2,2,1000);

tic; ainv = mult__inv(a); tt(j,1) = toc;

tic;

for k=1:100

chainv(:,:,k) = inv(a(:,:,k));

end

tt(j,2) = toc;

end

disp(’median time taken in seconds x 10000’);

disp(’(first number = mult__inv; second number = loop)’);

disp(median(tt,1)*10000);

0.9792 7.8451

The speed improvement is about 8 times faster with mult inv.m.

72

15.2 Matlab header

%% [inva, logabsdd] = mult__inv(a, whichoutput)

% inverts an array of matrices so that

% inva(:,:,i) = inv(a(:,:,i))

%

%

%% INPUTS:

% a: [N x N x T] matrix array

% whichoutput: text if = ’lobabsdet’ computes the

% log(abs(det(a))) only.

%

%% OUTPUTS

% inva: [N x N x T] matrix array: a^-1

% logabsdet [1 x T] vector of log(abs(det(a)))

73

16 kron vec.m: efficiently computes Kronecker ten-

sor product of arrays of vectors

%% y = kron__vec(u, v)

% efficiently computes [kron(u(:,1), v(:,1)), ... kron(u(:,T), v(:,T))]

% (efficiently == wihtout using an inefficient loop)

16.1 Examples

A = randn(5,500);

B = randn(4,500);

C = kron__vec(A,B);

% check speed

tic; for i=1:100; C = kron__vec(A,B); end; toc

tic;

for i=1:100

for k=1:500

C(:,k) = kron(A(:,k), B(:,k));

end

end

toc



The speed improvement is about 48 times faster than using a loop.

74

16.2 Matlab header

%% y = kron__vec(u, v)

% efficiently computes [kron(u(:,1), v(:,1)), ... kron(u(:,T), v(:,T))]

% (efficiently == wihtout using an inefficient loop)

%

%% INPUTS:

% u: [N x T]

% v: [M x T]

% dim: scalar optional.

% if dim=1, then u:[T x N], v:[T x M]

% and y = [kron(u(1,:), v(1,:)); ... kron(u(T,:), v(T,:))]

%

%% OUTPUTS:

% y: [MN x T] if dim=1 -> y:[T x MN]

%

75

17 kron sum.m: efficiently computes Kronecker ten-

sor product of arrays of matrices and then take sum

% y = kron__sum(A,B)

% efficiently computes sum of kron(A(:,:,i), B(:,:,i) over all i

% (where efficiently = without using an inefficient loop)

17.1 Examples

A = randn(5,4,500);

B = randn(4,7,500);

C = kron__sum(A,B);

% check speed

tic; for i=1:100; C = kron__sum(A,B); end; toc

tic;

for i=1:100

C(:)=0;

for k=1:500

C = C+kron(A(:,:,k), B(:,:,k));

end

end

toc



The speed improvement is about 48 times faster than using a loop.

76

17.2 Matlab header

% y = kron__sum(A,B)

% efficiently computes sum of kron(Ai, Bi) over all i

% (where efficiently = without using an inefficient loop)

%

%% INPUTS:

% A: [n x h x T]

% B: [m x k x T]

%

%% OUTPUTS:

% y: [mn x kh x T]

77

18 df dx.m: efficiently computes the (numerical)

first order derivative of a given function

%% y = df__dx(f, x)

% computes numerical first order derivative of f(x).

18.1 Rough idea

Consider the Taylor expansion:

f(x+ ∆)− f(x) = f ′(x)∆ +1

2f ′′(x)∆2 + ...,

f(x−∆)− f(x) = −f ′(x)∆ +1

2f ′′(x)∆2 + ...

If we use just one of the two expansions:

f ′(x) ≈ f(x+ ∆)− f(x)

∆

then the approximation will involve errors of order ∆2 and above.Using both expansions to approximate:

f ′(x) ≈ f(x+ ∆)− f(x−∆)

2∆

then the approximation will involve errors of order ∆3 and above. For each value of ∆, thisshows that one extra function evaluation (f(x−∆)) can improves the numerical approximationof f ′(x).

Now consider also:

f(x+ 2∆)− f(x) = 2 f ′(x)∆ + 2f ′′(x)∆2 +4

3f ′′′(x)∆3 +

2

3f ′′′′(x)∆4...,

f(x− 2∆)− f(x) = −2 f ′(x)∆ + 2f ′′(x)∆2 − 4

3f ′′′(x)∆3 +

2

3f ′′′′(x)∆4...

The idea is, by using the extra evaluations of f(x+ 2∆) and f(x− 2∆), we can improvethe numerical approximation of f ′(x) by eliminating the ∆3 and ∆4 order terms.

Why can’t we just make ∆ very small, which will make all ∆n terms very small?

The issue is ∆ can only be made so small before our computers will see x and x+ ∆ as beingidentical. For example, 1e− 20 is effectively viewed as zeros by most current computers.

The question is, given the smallest ∆ that we want to work with (that guarantees numericalstability), what else can we do to improve the numerical accuracy of our calculations? Oneanswer is by cleverly combining different Taylor expansions as explained above.

78

18.2 Examples

First, a simple use of the function:

x = rand(3,1);

f = @(x) x(1)^2+exp(x(2))+1/x(3)+1;

dfdx(f,x)

ans =

0.7820

1.4804

-26.7513

Second, check the output against the corresponding analytical derivative:

df = @(x) [2*x(1); exp(x(2)); -1/x(3).^2];

[dfdx(f,x)- df(x)]

ans =

1.0e-10 *

-0.5649

0.0253

0.9018

79

18.3 Matlab header

%% y = df__dx(f, x)

% computes numerical first order derivative of f(x).

%

%% INPUTS:

% f: function [R x T] vector valued function

% handle

% x: [M x N] parameter value

%

%% OUTPUTS:

% y: matrix [MN x RT]

%

80

a small package of matlab routines for the estimation of some … · 2018. 8. 21. · a small...

Documents