4. forecasting with arma modelsfaculty.chicagobooth.edu/jeffrey.russell/teaching/timeseries/... ·...

4. Forecasting with ARMA models

MSE optimal forecasts

• Let denote a one‐step ahead forecast given information Ft . For ARMA models,

• The optimal MSE forecast minimizes

211t tE Y Y

1, ,...t t tF Y Y

• MSE is a convenient forecast evaluation that has nice theoretical properties.

• MSE is not always a good way to evaluate forecasts from an economic perspective.

– It is symmetric. Do you care the same about underestimating the time to walk to the bus vs. overestimating?

– In general, non‐linear payoff functions are not optimized by MSE.

The optimal MSE forecast is the conditional expectation

• To see this, let g(Xt) denote any other forecast.

• The middle (cross product) term has expectation zero by the Law of Iterated Expectations (LIE)

t t tt t t

t t t t t t t t

Y E Y X

Y E Y X Y

E Y g X E E Y X g X

E Y X g X

E Y X g

|E Y E E Y X

only randomthing giv

| 00| |tt t tt t t t t tE Y X g XE E YY E Y E X g XXX

• So the middle term vanishes and we are left with:

• The smallest the second term can be is zero and it is zero only when

• So the optimal MSE forecast is the conditional expectation.

1 1 | |t t t tt tt tE Y X gE Y g X E EY Y XE X

1 |t t tE Y X g X

Restricting our attention to linear models, a similar result holds for Least Squares estimators

• Let so that is the linear projection of Yt+1 on Xt, or the least squares solution.

• A nearly identical proof shows that the linear projection provides the MSE optimal forecast in the class of linear predictors.

• Let denote the least squares solution and let denote any other linear combination.

1 0t t tE Y X X tX

• The middle term again has expectation zero.

• The resulting expression

is minimized when

E Y X g E

X X gE

t t t t

E EX XY X Y X

X gE X g

1 t tt t ttE Y Y XX g E E X X g

t tX X g

• So if the true conditional expectation is linear then the linear model is the optimal MSE.

• If the true model is non‐linear, then the linear model is the best linear predictor.

Forecasting ARMA models

• Consider the ARMA model:

• If the model is weakly stationary we get:

t tL Y L

21 21 ... p

pL L L L

21 21 ... q

qL L L L

t tL L Y L L

t j t jj

Multi‐step ahead forecast

• Or,

• If the ’s are independent, then conditional on past ’s, future ’s have expectation zero:

• The forecast error is

• Since the forecast errors are uncorrelated with the “X’s” use to predict, this is the best linear predictor.

1| ,t k t t j t k jj k

future after time t past, before time t

t k j t k j j t k jj j k

t k t j t k jj

If the ’s are white noise (not iid) then the forecast is the optimal linear

predictor

• Denote

• Since the forecast errors are uncorrelated with the “X’s” (past epsilons) this is the best linear predictor.

| kt k t t j t k j

E Y F Y

t k t j t k jj

• The forecast error is defined as the difference between the outcome and it’s forecast is denoted by:

kt t k t k t j t k j

e Y E Y F

• The k‐step ahead forecast error variance is then given by:

• If the errors are Gaussian then the 95% prediction interval is given by:

k kkt j t k j j

Var e Var

Example

• Consider the AR(1) model

• Pre‐multiply both sides of Yt+k by:

yielding

01 t tL Y

2 2 1 11 ... k kL L L

k kk j j

t k t t k jj j

k k j jt k t k j

• The k‐step ahead forecast is:

• The forecast error variance is:

k j j kt k t t t k j t

E Y F Y E Y

t k jj

1 12 2

k kj j

t jj j

• Notice that we could pre‐multiply by

regardless of whether is

less than one (we didn’t actually use the

inverse operator).

2 2 1 11 ... k kL L L

• If the point forecast is

and the forecast error variance is:

kkj k k

t k t t tj

E Y F Y Y

2 2 22

kk kk j jt t j

Var e Var

• If , the point forecast is given by:

• The forecast error variance is given by:

k kk j jt t k j

Var e Var k

j kt k t t t

E Y F Y k Y

A simple algorithm for multi‐step ahead forecasts

• 1 step:

• 2 step

• 3 step

1 1 11 1

t t j t j j t jj i

E Y F Y

2 1 1 2 1 1 22 1

One-step ahead forecast 0

1 t+1 t 2 22 1

| E | |

E Y |F

t t t t j t j t t j t ij i

j t j j t ij i

E Y F Y F Y E F

3 1 2 2 1 3 33 3

| | |p q

t t t t t t j t j j t jj i

E Y F E Y F E Y F Y

• So once you have the one‐step you can get the two step, once you have the two‐step, you can get the three‐step and so on.

• Plug in conditional expected values for future values of Y and zero for the expectation of future values of . Plug in know past values for values of Y and at or before time t.

5. Maximum Likelihood Estimation

• Let’s start with a simple example.

• Suppose that we take an iid sample of size 10 of whether a voter will vote for candidate A.

• Suppose the data look like this:

1 0 1 1 1 0 0 1 0 1 where a 1 denotes “favor candidate A”.

• Let p denote the true fraction of voters that favor candidate A. p is the parameter we want to estimate.

• If we knew p, how likely would it be to observe this sequence of data?

• Well, they are all independent and identically distributed.1 0 1 1 1 0 0 1 0 1The joint probability that x1=1, x2=0, x3=1… is given by:p(1‐p)ppp(1‐p)(1‐p)p(1‐p)p=p6(1‐p)4

• More generally, for a sample of size n we can write the probability of observing the sample as

• This is simply the joint probability of n outcomes.

01 1nnp p

• For a given value of p, tells us how likely it is to observe this particular data set. We therefore call this the likelihood function.

• The goal of maximum likelihood is to find the parameter value p that maximizes the likelihood function.

• In doing so we are finding the parameter value that makes it most likely that we observe our given sample.

01| 1nnL data p p p

• The log function is a monotonically increasing function. This means that the value of p that maximizes the likelihood function L is the same as the value of p that maximizes the log of the likelihood function.

1 0| ln ln ln 1data p L n p n p L

• We can maximize the likelihood by taking the derivative and setting it to zero.

• The p that maximizes the log likelihood function will be our estimate of p so we denote it by p

• Taking the derivative and setting to zero yields:

• The solution is:

d p data nn

dp p p

pn n n

• For continuous random variables the density function describes probabilities.

• Let denote a density function that depends on a possible vector of parameters (i.e. for the normal it depends on the mean and variance).

• If we take an iid sample of a continuous random variable then the likelihood of observing the sample is given by the product of the marginal densities.

1 2 1 2, , , | | | |n nf y y y f y f y f y

• It is convenient to take the log of the joint density function in which case we get:

• As before, the value of that maximizes the function f will also maximize ln(f) and is called the Maximum Likelihood Estimate .

ln , , , | ln |n

f y y y f y

• Example: iid Normal likelihood function.

(on white board)

How do we set up the likelihood for dependent time series data?

• The factorization of the likelihood into the product of marginal densities relied on an iid sample of data which is clearly not true for dependent time series data.

• Fact: without any loss of generality, the joint density function can always be expressed as:

1 1, , , |T Tf y y y

1 1 1 2 1 1 2 3 1

2 3 4 1 2 1 1

, , , | | , ,..., ; | , ,..., ;

| , ,..., ; ... | ; ;

T T T T T T T T

f y y y f y y y y f y y y y

f y y y y f y y f y

• Taking logs of the joint density then gives:

• The value of that maximizes the likelihood is called the Maximum Likelihood Estimate (MLE) of

1 2 1 2 11

ln , , , | ln | , ,..., ;n

n i i ii

f y y y f y y y y

• All the models that considered in class are different models for this conditional distribution.

• The result says that if we can write the conditional density of f(yt+1|Ft) we can construct the likelihood.

• But this is exactly how we specified the models discussed so far. We defined the dynamics conditional of Ft.

• AR(1) model likelihood under Gaussian errors.

(on white board)

Variance of maximum likelihood parameter estimates

• The value of that maximizes the likelihood is our estimate, but how accurate is the estimate?

• If the assumed distribution is correct then the variance covariance matrix of the parameter estimates is given by inverse of the information matrix denoted by I‐1.

• Write the log likelihood function as: where

• There are two ways to estimate the information matrix:

ˆ ˆ21

1 1ˆ | |T

d ld LI

T d d T d d

ˆ ˆ1

1ˆ | |T

dl dlI

1 2 1ln | , ,..., ;t t t tl f y y y y

Constructing the information matrix

• The if 0 is the true parameter value, the distribution of the parameter estimates is given by:

• Where is one of the two estimates or

11ˆ ˆ~ ,oN IT

• When the model is correctly specified the two estimate of the information matrix are the same in large samples. In small samples they will differ a little although neither is preferred.

• There are several ways the model can be wrong.

– First, we could misspecify the dynamics (i.e. have the wrong AR model)

– Second, we could use the wrong shape for the conditional distribution (i.e. use a Normal when we should have used a t‐distribution)

Getting the standard errors right

• Interestingly, if we have the dynamics right but falsely assume Normality, the parameter estimates are still consistent under very general conditions.

• The standard errors will be wrong, however.

• The good news is that we know how to fix them.

White Robust Standard Errors

• If the dynamics are correctly specified, but the assumed distribution is wrong the estimates are consistent and the variance covariance matrix can be estimated by:

1 10 2 2

1ˆ ˆ ˆ ˆ~ , d op dN I I IT

4. forecasting with arma modelsfaculty.chicagobooth.edu/jeffrey.russell/teaching/timeseries/... ·...

Documents

predictive macro-finance with dynamic partition...

rafiz arma

forecasting day ahead electricity price using arma methods

arma model

silabo arma

short-term density forecasting of wave energy using arma...

arma nucleara

arma november2010

artalomjegyzék t - elteartalomjegyzék t 1. ezetés bev 1...

1 forecasting streamflow by artificial neural networks, arma...

arma submarina

arma taser

procesos arma

arma nevazuta

arma models

evolving time series forecasting arma models · evolving...

arma modeling

garch modelshome.iitj.ac.in/~parmod/document/garch...

northern virginia chapter of arma (arma nova) sponsorship

heridas por arma blanca y arma de fuego