2.5 Forecasting and Impulse Response Functions
Principles of forecasting
Forecast based on conditional expectations Suppose we are interested in forecasting the
value of yt+1 based on a set of variables Xt (m× 1 vector).
Let yt+1|t denote such a forecast.
To evaluate the usefulness of the forecast we need to specify a loss function. Here we
specify a quadratic loss function. A quadratic loss function means that yt+1|t is chose to
minimize E(yt+1 − yt+1|t)2.
E(yt+1 − yt+1|t)2 is known as the mean squared error associated with the forecast yt+1|tdenoted
MSE(yt+1|t) = E(yt+1 − yt+1|t)2
Fundamental result: the forecast with the smallest MSE is the expectation of yt+1|t condi-
tional on Xt that is
yt+1|t = E(yt+1|Xt)
We now verify the claim. Let g(Xt) be any other function and let yt+1|t = g(Xt). The
associated MSE is
E [yt+1 − g(Xt)]2 = E [yt+1 − E(yt+1|Xt) + E(yt+1|Xt)− g(Xt)]
2
= E [yt+1 − E(yt+1|Xt)]2 +
+2E {[yt+1 − E(yt+1|Xt)] [E(yt+1|Xt)− g(Xt)]}++[E(yt+1|Xt)− g(Xt)]
2
Define
ηt+1 ≡ [yt+1 − E(yt+1|Xt)] [E(yt+1|Xt)− g(Xt)]
(24)
The conditional expectation is
E(ηt+1|Xt) = E {[yt+1 − E(yt+1|Xt)] [E(yt+1|Xt)− g(Xt)] |Xt}= [E(yt+1|Xt)− g(Xt)]E {[yt+1 − E(yt+1|Xt)]Xt}= [E(yt+1|Xt)− g(Xt)] [E(yt+1|Xt)− E(yt+1|Xt)]
= 0
Therefore by law of iterated expectations
E(ηt+1) = E(E(ηt+1|Xt)) = 0
This means that
E [yt+1 − g(Xt)]2 = E [yt+1 − E(yt+1|Xt)]
2 + E([E(yt+1|Xt)− g(Xt)])2
Therefore the function that minimizes the MSE is
g(Xt) = E(yt+1|Xt)
E(yt+1|Xt) is the optimal forecast of Yt+1 conditional of Xt under a quadratic loss function.
The MSE of this optimal forecast is
E[yt+1 − g(Xt)]2 = E[yt+1 − E(yt+1|Xt)]
2
Forecast based on linear projections We now restrict the class of forecasts we consider
to be linear function of Xt:
yt+1|t = α′Xt
Suppose α′ is such that the resulting forecast error is uncorrelated with Xt
E[(yt+1 − α′Xt)X′t] = 0 (25)
If (25) holds the we call α′Xt the linear projection of yt+1 on Xt. The linear projection
produces the smallest mean square forecast error among the class of linear forecasting rules.
To verify this let g′Xt be any arbitrary forecasting rule.
E(yt+1 − g′Xt
)2= E
(yt+1 − α′Xt + α′Xt − g′Xt
)2= E
(yt+1 − α′Xt
)2+
+2E[(yt+1 − α′Xt
) (α′Xt − g′Xt
)]+
+E(α′Xt − g′Xt
)2
the middle term
E[(yt+1 − α′Xt
) (α′Xt − g′Xt
)] = E[
(yt+1 − α′Xt
)X ′t] [α− g]
= 0
by definition of linear projection. Thus
E(yt+1 − g′Xt
)2= E
(yt+1 − α′Xt
)2+ E
(α′Xt − g′Xt
)2
The optimal linear forecast is the value g′Xt = α′Xt.
We use the notation
P (yt+1|t) = α′Xt
to indicate the linear projection of yt+1 on Xt.
Notice that
MSE[P (yt+1|Xt)] ≥MSE [E(yt+1|Xt)]
The projection coefficient α can be calculated in terms of moments of yt+1 and Xt.
E(yt+1X′t) = α′E(XtX
′t)
α′ = E(yt+1X′t)[E(XtX
′t)]−1
It is easy to generalize to the multivariate case. Let Yt be a n× 1 vector α an m× n matrix
and Xt an m× 1 vector. The linear pojection is defined to be the vactor α′Xt satisfying
E(Yt+1 − α′Xt)X′t = 0
Linear projections and OLS regression There is a close relationship between OLS esti-
mator and the linear projection coefficient. If Yt+1 and Xt are stationary processes and also
ergodic for the second moments then
(1/T )
T∑t=1
XtX′t
p→ E(XtX′t)
(1/T )
T∑t=1
Xtyt+1p→ E(Xtyt+1)
implying
βp→ α
The OLS regression yields a consistent estimate of the linear projection coefficient.
Forecasting with VAR models
Iterated forecast Let us consider the VAR(p) in companion form
Yt = AYt−1 + et (26)
where et is white noise. The h-step ahead linear predictor of Yt+h conditional on the infor-
mation available at time t, Yt+h|t, is given by
Yt+h|t = AhYt = AYt+h−1|t (27)
where the first n rows of Yt+h|t represent the optimal forecast of Yt+h. From (27) it is easy
to compute recursively the forecast for Yt+h at any horizon.
The predictor in the previous slide is optimal in the sense that, as we know, it delivers
the minimum MSE among those that are linear functions of Y .
Using
Yt+h = AhYt +
h−1∑i=0
Aiet+h−i (28)
we get the forecast error
Yt+h −Yt+h|t =h−1∑i=0
Aiet+h−i (29)
From the forecast error it is easy to obtain the Mean Square Error, the covariance of the
forecast error,
MSE[Yt+h|t] = E(Yt+k −Yt+h|t
)(Yt+k −Yt+h|t)′ = Σ(h) =
h−1∑i=0
AiΩAi′
= Σ(h− 1) +Ah−1ΩAh−1′
the MSE for will be the first upper left n×n matrix. Notice that the MSE is non decreasing
and that as h→∞ will approach the variance of Yt.
Direct forecast An alternative is to compute the direct forecast by computing the projection
of Yt+h on Yt. To see this consider a bivariate VAR(p) with two variables, xt and yt. We
want to forecast xt+h given the incormation available at time t.
The direct forecast works as follows:
1. Estimate the projection equation
xt = a+
p−1∑i=0
φixt−h−i +p−1∑i=0
ψiyt−h−i + εt
2. Using the estimated coefficients, the predictor xt+h|t is obtain as
xt+h|t = a+
p−1∑i=0
φixt−i +p−1∑i=0
ψiyt−i
Forecast evaluation: pseudo out-of-sample exercises A key issue in forecasting is to
evaluate the forecasting accuracy of a model of interest. In particular several times we will
be interested in comparing the performance of competing forecasting models.
How can we perform such a forecast evaluation?
Answer: we can compare the mean squared errors using pseudo out-of-sample forecast exer-
cises.
Suppose we have a sample of T observations. Let τ = T0 < T . A pseudo out-of-sample
exercise works as follows:
1. We use τ observations to estimate the parameters of the model.
2. We forecast Yτ+j with Yτ+j|τ j = 1,2, ..., s.
3. We compute the forecast error wτ+j|τ = Yτ+j − Yτ+j|τ with j = 1,2, ..., s.
4. We update τ = τ +1 and repeat steps 1-3.
We repeat steps 1-4 up to the end of the sample and we compute the mean squared error
for each variable i = 1, ...n.
MSE(Yi,T+j|T) =1
T − T0 − j
T−T0−j∑τ=T0
w2i,τ+j|τ
or the root mean squared error
RMSE(Yi,T+j|T) =
√√√√ 1
T − T0 − j
T−T0−j∑τ=T0
w2i,τ+j|τ
Repeating the same exercise for various models we can choose the one which has the smallest
MSE or RMSE.
Application: Forecasting inflation and unemployment
D’Agostino Gambetti and Giannone (JAE forthcoming) consider a trivariate VAR(2) model
including inflation unemployment and the short term interest rate.
The sample spans from 1948:I-2007:IV. Forecast up to 12 quarters ahead.
Estimation is in real time and is made using both VAR and AR for each of the series.
Estimation is done both recursively snd with rolling window.
Impulse Response Functions
Impulse response functions represent the mechanisms through which shock spread over time.
Let us consider the Wold representation of a covariance stationary VAR(p),
Yt = C(L)εt
=
∞∑i=0
Ciεt−i (30)
The matrix Cj has the interpretation
∂Yt
∂ε′t−j= Cj (31)
or
∂Yt+j
∂ε′t= Cj (32)
That is, the row i, column k element of Cj identifies the consequences of a unit increase in
the kth variable’s innovation at date t for the value of the ith variable at time t+ j holding
all other innovation at all dates constant.
Example 1 Let us assume that the estimated matrix of VAR coefficients is
A =(
0.8 0.1
−0.2 0.5
)(33)
with eigenvalues 0.8562 and 0.4438. We generate impulse response functions of the Wold
representation
Cj = Aj
Figure 3: Impulse response functions
Example 2 Let us now assume that the second variable does not Granger cause the first
one so that
A =(
0.8 0
−0.2 0.5
)(34)
with eigenvalues 0.8 and 0.5. Impulse response functions are plotted in the next figure
Figure 4: Impulse response functions when the second variable does not Granger cause the
first one.
Cumulated impulse response functions Suppose Yt is a vector of trending variables (i.e.
log prices and output) so we consider the first difference to reach stationarity. So the model
is
ΔYt = (1− L)Yt = C(L)εtWe know how to estimate, interpret, and conduct inference on C(L). But suppose we are
interested in the response of the levels of Yt rather than their first differences (the level of
and prices rather than their growth rates). How can we find these responses? We transform
the model
Yt = Yt−1 + C(L)εtThe effect of εt on Yt is C0. Now substituting forward we obtain
Yt+1 = Yt−1 + C(L)εt + C(L)εt+1
= Yt−1 + C0εt+1 + (C0 + C1)εt + ...
(35)
and for two periods ahead
Yt+2 = Yt−1 + C(L)εt + C(L)εt+1 + C(L)εt+2
= Yt−1 + C0εt+2 + (C0 + C1)εt+1 + (C0 + C1 + C2)εt+1...
(36)
so the effect of εt on Yt+1 are (C0 +C1) and on Yt+2 is (C0 +C1 +C2). In general the effects
of εt on Yt+j are
Cj = C0 + C1 + ...+ Cj
defined as cumulated impulse response functions.
Error Bands for Impulse Response Functions
Method I: Asymptotic Hamilton (1994).
Method II: Bootstrap The idea behind bootstrapping (Runkle, 1987) is to obtain esti-
mates of the small sample distribution for the impulse response functions without assuming
that the shocks are Gaussian. Steps:
1. Estimate the VAR and save the π and the fitted residuals {u1, u2, ..., uT}.
2. Draw uniformly from {u1, u2, ..., uT} and set u(1)1 equal to the selected realization
and use this to construct
Y (1)1 = A1Y0 + A2Y−1 + ...+ ApY−p+1 + u(1)1 (37)
3. Taking a second draw (with replacement) u(2)1 generate
Y (1)2 = A1Y
(1)1 + A2Y0 + ...+ ApY−p+2 + u(2)1 (38)
4. Proceeding in this fashion generate a sample of length T {Y 11 , Y
12 , ..., Y
1T } and use
the sample to compute π(1) and the implied impulse response functions C(1)(L).
5. Repeat steps (3)−(4) M times and collect M realizations of C(l)(L), l = 1, ...M and
takes for all the elements of the impulse response functions and for all the horizons
the αth and 1− αth percentile to construct confidence bands.
Method III: Montecarlo As bootstrap but drawing the residuals from a normal distribution.
Example: A Monetary VAR with bootstrap bands
We estimate the standard monetary VAR which includes real output growth, the inflation
rate and the federal funds rate. These three variables are the core variables for monetary
policy analysis in VAR models. Data are taken from the St.Louis Fed FREDII database.