financial data analysis - beijing foreign studies university · linear models for financial time...

Financial Data Analysis

Instructor: Xi Chen

March 6, 2017

Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 1 / 153

Linear Models for Financial Time Series

Outline

1 Linear Models for Financial Time Series



Chapter Overview

1 simple autoregressive (AR) models,

2 simple moving average (MA) models,

3 mixed autoregressive moving average (ARMA) models,

4 unit-root models including unit-root tests,

5 exponential smoothing,

6 seasonal models,

7 regression models with time series errors,

8 fractionally differenced models for long-range dependence.



Stationarity

Definition 1

A time series xt is weakly stationary if its first two moments (mean andvariance) are time invariant. The weak stationarity is important becausethey provide the basic framework for prediction.

For a given integer k , define the lag-k autocovariance of xt asγk = Cov(xt , xt−k). Using the Cauchy-Schwarz inequality, we can showthat γk exists and is also time invariant. γk has two important properties:

1 γ0 = Var(xt)

2 γ−k = γk (proof)



Correlation and Autocorrelation Function




Definition 2

The degree of this linear dependence is often measured by the Pearson’scorrelation coefficient or simply correlation coefficient. In statistics, thecorrelation coefficient between two random variables X and Y is defined as

ρx ,y =Cov(X ,Y )√

Var(X )Var(Y )=

E [(X − µx)(Y − µy )]√E (x − µx)2E (Y − µy )2

This coefficient measures the strength of linear dependence between Xand Y , and it can be shown that −1 ≤ ρx ,y ≤ 1 and ρx ,y = ρy ,x . The tworandom variables are uncorrelated if ρx ,y = 0. In addition, if both X andY are normal random variables, then ρx ,y = 0 if and only if X and Y areindependent.




When the sample {(xt , yt)|t = 1, . . . ,T} is available, the correlation canbe consistently estimated by its sample counterpart,

ρx ,y =

∑Tt=1 (xt − x)(yt − y)√∑T

t=1(xt − x)2∑T

t=1(yt − y)2

In theory, the Pearson correlation coefficient is between -1 and 1. How-ever, for some random variables, the actual range of the coefficient can beshorter. The two most popular alternatives are

Spearman’s rho is the correlation coefficient based on the ranks of themarginal variables.Kendall’s tau denotes the difference between concordance anddiscordance. Suppose that (X1,Y 1) and (X2,Y 2) are twoindependent and identically distributed (iid) continuous bivariaterandom variables. Kendall’s tau is defined as

τ = P[(X1 − X2)(Y1 − Y2) > 0]− P[(X1 − X2)(Y1 − Y2) < 0].



R Illustration



Autocorrelation Function (ACF)

Definition 3

Consider a weakly stationary time series xt . The correlation coefficientbetween xt and xt−k is called the lag-k autocorrelation of xt and iscommonly denoted by ρk . Specifically, we define

ρk =Cov(xt , xt−k)√

Var(xt)Var(xt−k)=

Cov(xt , xt−k)

Var(xt)=γkγ0, (1)

The collection of autocorrelations, {ρk}, is called the autocorrelationfunction (ACF) of xt . A weakly stationary time series xt is not seriallycorrelated if and only if ρk = 0 for all k > 0.




Definition 4

For a given sample {xt |t = 1, . . . ,T}, let x be the sample mean (i.e.,x =

∑Tt=1 xt/T ). In general, the lag-k sample autocorrelation of xt is

defined as

ρk =

∑Tt=k+1 (xt − x)(xt−k − x)∑T

t=1(xt − x)2, 0 ≤ k < T − 1. (2)

If {xt} is a sequence of iid random variables satisfying E (x2t ) <∞, thenρk is asymptotically normal with mean 0 and variance 1/T for any fixedpositive integer k.More generally, if xt is a weakly stationary time series satisfyingxt = µ+

∑qi=0 ϕiat−i , where ϕ0 = 1 and {aj} is a sequence of iid random

variables with mean 0, then ρk is asymptotically normal with mean 0 andvariance (1 + 2

∑qi=1 ρ

2i )/T for k > q.




Example 5 (Example 2.1)

Consider the monthly simple returns of the Decile 10 portfolio of CRSPfrom January 1967 to December 2009. There are 516 observations, thatis, T = 516. The portfolio consists of the smallest 10% of the stocks, inmarket capitalization, on NYSE/AMEX/NASDAQ and is rebalancedannually.



R Illustration



Testing Individual ACF

For a given positive integer k , the previous result can be used to testH0 : ρk = 0 versus Ha : ρk 6= 0. The test statistic is

t-ratio =ρk√

(1 + 2∑k−1

i=1 ρ2i )/T

.

If {xt} is a stationary Gaussian series satisfying ρj = 0 for j > k , thet-ratio is asymptotically distributed as a standard normal random variable.Hence, the decision rule of the test is to reject H0 if |t − ratio| > Zα/2,

where Zα/2 is the 100(1− Zα/2)th percentile of the standard normaldistribution.Alternatively, one can use the p-value of the t-ratio to draw a conclusion.If the p-value is less than the type I error, say 0.05, then the nullhypothesis is rejected. Otherwise, one cannot reject H0.



Testing Individual ACF

Example 6 (Example 2.1 (continued))

For various reasons, for example, tax consideration or year-end portfolioadjustment, small stocks in the United States tend to show a positivereturn in January. This is referred to as the January effect of small stocks.A simple approach to verify the existence of the January effect in smallstocks is to test the null hypothesis H0 : ρ12 = 0 versus the alternativehypothesis Ha : ρ12 6= 0, using the monthly simple returns of CRSP Decile10 portfolio of Example 2.1.



R Illustration



Portmanteau Test

Definition 7

Box and Pierce (1970) propose the Portmanteau statistic

Q∗(m) = Tm∑l=1

ρ2l

as a test statistic for the null hypothesis H0 : ρ1 = · · · = ρm = 0 againstthe alternative hypothesis Ha : ρi 6= 0 for some i ∈ {1, . . . ,m}. Ljung andBox (1978) modify the Q∗(m) statistic as below to increase the power ofthe test in finite samples,

Q(m) = T (T + 2)m∑l=1

ρ2lT − l

(3)



Portmanteau Test

Example 8

Consider the monthly simple and log returns of IBM stock from January1967 to December 2009. The sample size is 516. To verify that thereturns have no serial correlations, we test H0 : ρ1 = ρ2 = · · · = ρm = 0versus Ha : ρi 6= 0 for some i ∈ {1, · · · ,m} with m = 12 and 24.



White Noise and Linear Time Series

Definition 9

A time series xt is called a white noise if {xt} is a sequence of iid randomvariables with finite mean and variance. In particular, if xt is normallydistributed with mean 0 and variance σ2, the series is called a Gaussianwhite noise. For a white noise series, all the ACFs are 0. In practice, if allsample ACFs are close to 0, then the series is a white noise series.




Definition 10

A time series xt is said to be linear if it can be written as

xt = µ+∞∑i=0

ϕiat−i , (4)

where µ is the mean of xt , ϕ0 = 1, and {at} is a sequence of iid randomvariables with mean 0 and a well-defined distribution (i.e., {at} is a whitenoise series).at denotes the new information at time t of the time series and is oftenreferred to as the innovation or shock at time t.The dynamic structure of xt is governed by the coefficients ϕi , which arecalled the ϕ− weights of xt .




If xt is weakly stationary, we can obtain its mean and variance easily byusing properties of {at} as

E (xt) = µ, Var(xt) = σ2a

∞∑i=0

ϕ2i , (5)

The lag-l autocovariance of xt is

γl = Cov(xt , xt−l) = E [(∞∑i=0

ϕiat−i )(∞∑j=0

ϕjat−l−j)]

= E (∞∑

i ,j=0

ϕiϕjat−iat−l−j) =∞∑j=0

ϕj+lϕjE (a2t−l−j)

= σ2a

∞∑j=0

ϕjϕj+l . (6)




Consequently, the ϕ-weights are related to the autocorrelations of xt asfollows:

ρl =γlγ0

=

∑∞i=0 ϕiϕi+l

1 +∑∞

i=1 ϕ2i

, l ≥ 0,

For a weakly stationary time series, ϕi → 0 as i →∞ and, hence, ρlconverges to 0 as l increases. For asset returns, this means that, asexpected, the linear dependence of the current return xt on the remotepast return xt−l diminishes for large l .



Simple Autoregressive Models

Definition 11

A Simple Autoregressive Model of order 1 (AR(1)) is in the following form:

xt = φ0 + φ1xt−1 + at , (7)

Conditional on the past return xt−1, we have

E (xt |xt−1) = φ0 + φ1xt−1, Var(xt |xt−1) = Var(at) = σ2a .

A straight- forward generalization of the AR(1) model is the AR(p) model

xt = φ0 + φ1xt−1 + · · ·+ φpxt−p + at , (8)



Properties of AR Models

The sufficient and necessary condition for weak stationarity of theAR(1) modelAssuming that the series is weakly stationary, we haveE (xt) = µ,Var(xt) = γ0, and Cov(xt , xt−j) = γj , where µ and γ0 areconstants and γj is a function of j , not t. Taking the expectation ofEquation (7) and using E (at) = 0, we obtain

E (xt) = φ0 + φ1E (xt−1).

Under the stationarity condition, E (xt) = E (xt−1) = µ and hence

E (xt) = µ =φ0

1− φ1



Properties of AR Models

Next, using φ0 = (1− φ1)µ, the AR(1) model can be rewritten as

xt − µ = φ1(xt−1 − µ) + at . (9)

By repeated substitutions, the prior equation implies that

xt − µ = at + φ1at−1 + φ21at−2 + · · ·

=∞∑i=0

φi1at−i (10)

Then, we have E [(xt − µ)at+1] = 0. By the stationarity assumption, wehave Cov(xt−1, at) = E [(xt−1 − µ)at ] = 0.(Why?) Taking the square andthe expectation of Equation (9), we obtain

Var(xt) = φ21Var(xt−1) + σ2a ,

Var(xt) =σ2a

1− φ21Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 31 / 153


Autocorrelation Function of an AR(1) Model

Multiplying Equation (9) by at , using the independence between at andxt−1, and taking expectation, we obtain

E (at(xt − µ)) = φ1E [at(xt−1 − µ)] + E (a2t ) = E (a2t ) = σ2a ,

Multiplying Equation (9) by (xt−l − µ), taking expectation, and using theprior result, we have

γl =

{φ1γ1 + σ2a if l = 0

φ1γl−1 if l > 0,

Therefore, the ACF of xt satisfies

ρl = φ1ρl−1, for l > 0.

Because ρ0 = 1, we have ρl = φl1. This result says that the ACF of aweakly stationary AR(1) series decays exponentially with rate φ1 andstarting value ρ0 = 1.Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 32 / 153


AR(2) Model

An AR(2) model assumes the form

xt = φ0 + φ1xt−1 + φ2xt−2 + at (11)

Using the same technique as that of the AR(1) case, we obtain

E (xt) = µ =φ0

1− φ1 − φ2(12)

provided that φ1 + φ2 6= 1.

Using φ0 = (1− φ1 − φ2)µ, we can rewrite the AR(2) model as

(xt − µ) = φ1(xt−1 − µ) + φ2(xt−2 − µ) + at . (13)



AR(2) Model

Multiplying the prior equation by (xt−l − µ), we have

(xt−l − µ)(xt − µ) = φ1(xt−l − µ)(xt − µ)

+ φ2(xt−l − µ)(xt − µ) + (xt−l − µ)at .

Taking expectation and using E [(xt−l − µ)at ] = 0 for l > 0, we obtain themoment equation

γl = φ1γl−1 + φ2γl−2, l > 0.

Dividing the above equation by γ0, we have the property

ρl = φ1ρl−1 + φ2ρl−2, l > 0. (14)

In particular, the lag-1 ACF satisfies

ρ1 = φ1ρ0 + φ2ρ1 = φ1 + φ2ρ1.



AR(2) Model

Therefore, for a stationary AR(2) series xt , we have ρ0 = 1,

ρ1 =φ1

1− φ2

ρl = φ1ρl−1 + φ2ρl−2, l ≥ 2.

The result of Equation (14) says that the ACF of a stationary AR(2) seriessatisfies the second-order difference equation

(1− φ1B − φ2B2)ρl = 0,

where B is called the backshift operator such that Bρl = ρl − 1.



AR(2) Model

Corresponding to the prior difference equation, there is a second-orderpolynomial equation

(1− φ1z − φ2z2) = 0, (15)

Solutions of this equation are

z =φ1 ±

√φ21 + 4φ22

−2φ2

The inverses of the two solutions (ω1, ω2) are referred to as thecharacteristic roots of the AR(2) model.

If both ωi are real valued, then the second-order difference equation of themodel can be factored as (1− ω1B)(1− ω2B).



AR(2) Model

For an AR(2) model in Equation (11) with a pair of complex characteristicroots, the average length of the stochastic cycles is

k =2π

cos−1[φ1/2√−φ2]

,

If one writes the complex solutions as a± bi ,where i =√−1,then we have

φ1 = 2a, φ2 = −(a2 + b2),and

k =2π

cos−1(a/√a2 + b2)

,



AR(2) Model



AR(2) Model


Consider the quarterly growth rate of US gross national product (GNP),seasonally adjusted, from the second quarter of 1947 to the first quarter of2010 for 252 observations. See Figure ??. The fitted model is

(1−0.438B−0.206B2−0.156B3)(xt−0.016) = at , σa = 9.55×10−5. (16)

Model (16) gives rise to a third-order polynomial equation

1− 0.438z − 0.206z2 + 0.156z3 = 0,

which has three solutions, namely, 1.616 + 0.864i , 1.616− 0.864i ,and−1.909.



AR(2) Model

Example 13 (Example 2.3 Continued)

The real solution corresponds to a factor 1− (1/− 1.909)z = 1 + 0.524zthat shows an exponentially decaying feature of the GNP growth rate.Focusing on the complex conjugate pair 1.616± 0.864i , we obtain theabsolute value 1.833 and

k =2π

cos−1(1.616/1.833)≈ 12.80,

The stationarity condition of an AR(2) time series is that the absolutevalues of its two characteristic roots are less than 1, that is, its twocharacteristic roots are less than 1 in modulus.

Under such a condition, the recursive equation in Equation (14) ensuresthat the ACF of the model converges to 0 as the lag l increases.Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 41 / 153


AR(2) Model

R Illustration of Example 2.3

> da=read.table("q-gnp4710.txt",header=T)

> head(da)

> G=da$VALUE

> LG=log(G)

> gnp=diff(LG)

> dim(da)

> tdx=c(1:253)/4+1947

> par(mfcol=c(2,1))

> plot(tdx,G,xlab=‘year’,ylab=‘GNP’,type=‘l’)

> plot(tdx[2:253],gnp,type=‘l’,xlab=‘year’,ylab=‘growth’)



AR(2) Model

R Illustration of Example 2.3

> acf(gnp,lag=12)

> pacf(gnp,lag=12)

> m1=arima(gnp,order=c(3,0,0))

> m1

> tsdiag(m1,gof=12)

> p1=c(1,-m1$coef[1:3])

> r1=polyroot(p1)

> r1

> Mod(r1)

> k=2*pi/acos(1.616116/1.832674)

> k



AP(p) Model

The mean of a stationary series is

E (xt) =φ0

1− φ1 − . . .− φp

The associated characteristic equation of the model is

(1− φ1z − φ2z2 . . .− φpzp) = 0,

For a stationary AR(p) series, the ACF satisfies the difference equation

(1− φ1B − φ2B2 − . . .− φpBp)ρl = 0, l > 0

The plot of ACF of a stationary AR(p) model would then show a mixtureof damping sine and cosine patterns and exponential decays depending onthe nature of its characteristic roots.



Identifying AR Models in Practice

Definition 14

The Partial Autocorrelation Function (PACF) of a stationary time series isa function of its ACF .

Consider the following AR models in consecutive orders:

xt = φ0,1 + φ1,1xt−1 + e1t ,

xt = φ0,2 + φ1,2xt−1 + φ2,2xt−2 + e2t ,

xt = φ0,3 + φ1,3xt−1 + φ2,3xt−2 + φ3,3xt−3 + e3t ,

......

These models are in the form of a multiple linear regression and can beestimated by the least squares (LS) method. The estimate φ1,1 of the firstequation is called the lag-1 sample PACF of xt . The estimate φ2,2 of thesecond equation is the lag-2 sample PACF of xt , and so on.Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 45 / 153



For a stationary Gaussian AR(p) model, it can be shown that the samplePACF has the following properties:

φp,p converges to φp as the sample size T goes to infinity.

φl ,l converges to 0 for all l > p.

The asymptotic variance of φl ,l is 1/T for l > p.

These results say that, for an AR(p) series, the sample PACF cuts off atlag p.



Information Criteria

Definition 15

Akaike Information Criterion (AIC) is defined as

AIC =−2

Tln (likelihood) +

2

T× (number of parameters), (17)

For a Gaussian AR(l) model, AIC reduces to

AIC(l) = ln (σ2l ) +2l

T,

Definition 16

For a Gaussian AR(l) model, the Schwarz-Bayesian criterion (BIC,Bayesian information criterion) is

BIC(l) = ln (σ2l ) +l ln (T )

T,



Selection Rule

To use AIC to select an AR model in practice, one computes AIC(l) forl = 0, . . . ,P, where P is a prespecified positive integer and selects theorder k that has the minimum AIC value. The same rule applies to BIC.



Selection Rule



Selection Rule

R Illustration:

> mm1=ar(gnp,method=‘mle’)

> mm1$order

> names(mm1)

> print(mm1$aic,digits=3)

> aic=mm1$aic

> length(aic)

> plot(c(0:12),aic,type=‘h’,xlab=‘order’,ylab=‘aic’)

> lines(0:12,aic,lty=2)



Parameter Estimation

Definition 17

The conditional LS method starts with the (p + 1)th observation.Specifically, conditioning on the first p observations, we have

xt = φ0 + φ1xt−1 + . . .+ φpxt−p + at , t = p + 1, . . . ,T .

The fitted model is

xt = φ0 + φ1xt−1 + . . .+ φpxt−p

and the associated residual is

at = xt − xt .

The series {at} is called the residual series, from which we obtain

σ2a =

∑Tt=p+1 a

2t

T − 2p − 1



Model Checking

If the model is adequate, then the residual series should behave as a whitenoise.

For an AR(p) model, the Ljung-Box statistic Q(m) follows asymptoticallyχ2(m − g), where g denotes the number of AR coefficients used in themodel.

If some of the estimated AR coefficients are not significantly different from0, then the model should be simplified by removing those insignificantparameters. If residual ACF shows additional serial correlations, then themodel should be extended to take care of the those correlations.



Model Checking

R Illustration:> vw=read.table(‘m-ibm3dx2608.txt’,header=T)[,3]

> m3=arima(vw,order=c(3,0,0))

> m3

> (1-.1158+.0187+.1042)*mean(vw) # Compute the intercept phi(0).

> sqrt(m3$sigma2) # Compute standard error of residuals

> Box.test(m3$residuals,lag=12,type=‘Ljung’)

> pv=1-pchisq(16.35,9) # Compute p value using 9 degrees of freedom

> pv

> m3=arima(vw,order=c(3,0,0),fixed=c(NA,0,NA,NA))

> m3

> (1-.1136+.1063)*.0089 # compute phi(0)

> sqrt(m3$sigma2) # compute residual standard error

> Box.test(m3$residuals,lag=12,type=‘Ljung’)

> pv=1-pchisq(16.83,10)

> pvXi Chen ([email protected]) Financial Data Analysis March 6, 2017 56 / 153


Goodness of Fit

Definition 18 (R-square)

A commonly used statistic to measure goodness of fit of a stationarymodel is the R-square (R2) defined as

R2 = 1− Residual sum of squares

Total sum of squares.

For a stationary AR(p) time series model with T observations{xt |t = 1, . . . ,T}, the measure becomes

R2 = 1−∑T

t=p+1 a2t∑T

t=p+1(xt − x)2.



Goodness of Fit

Definition 19 (Adjusted R-square)

For a given data set, it is well known that R2 is a nondecreasing functionof the number of parameters used. To overcome this weakness, anadjusted R2 is proposed, which is defined as

Adj(R2) = 1− Variance of residuals

Variance of xt

= 1− σ2aσ2x



Forecasting I

Definition 20 (forecast)

Suppose that we are at the time index h (forecast origin) and areinterested in forecasting xh+l , where l ≥ 1 (forecast horizon). The forecastxk(l) is chosen such that

E{[xh+l − xh(l)]2|Fh} ≤ ming

E [(xh+l − g)2|Fh],



Forecasting II

Definition 21 (1-Step Ahead Forecast)

From the AR(p) model, we have

xh+1 = φ0 + φ1xh + . . .+ φpxh+1−p + ah+1.

Under the minimum squared error loss function, the point forecast of xh+1

given Fh is the conditional expectation

xh(1) = E (xh+1|Fh) = φ0 +

p∑i=1

φixh+1−i

and the associated forecast error is

eh(1) = xh+1 − xh(1) = ah+1.



Forecasting III

Definition 22 (Multitep Ahead Forecast)

In general, we have

xh+l = φ0 + φ1xh+l−1 + . . .+ φpxh+l−p + ah+l .

Under the minimum squared error loss function, the point forecast of xh+l

given Fh is the conditional expectation

xh(l) = φ0 +

p∑i=1

φi xh(l − i)

mean reversion: For a stationary AR(p) model, xh(l) converges to E (xt)as l →∞, meaning that for such a series long-term point forecastapproaches its unconditional mean.



Forecasting IV



Forecasting V



Simple Moving Average Models I

An AR model with infinite order as

xt = φ0 + φ1xt−1 + φ2xt−2 + · · ·+ at .

A more practical model is

xt = φ0 − θ1xt−1 − θ21xt−2 − . . .+ at . (18)

The model in Equation (18) can be rewritten as

xt + θ1xt−1 + θ21xt−2 + . . . = φ0 + at . (19)

The model for xt−1 is then

xt−1 + θ1xt−2 + θ21xt−3 + . . . = φ0 + at−1. (20)



Simple Moving Average Models II

Then, we have

xt = φ0(1− θ1) + at − θ1at−1.

Definition 23

The general form of an MA(1) model is

xt = c0 + at − θ1at−1 or xt = c0 + (1− θ1B)at . (21)

An MA(q) model is

xt = c0 + at − θ1at−1 − . . .− θqat−q, (22)

or xt = c0 + (1− θ1B − . . .− θqBq)at , q > 0



Properties of MA Models: Stationarity

MA models are always weakly stationary because they are finite linearcombinations of a white noise sequence for which the first two momentsare time invariant.

Example 24 (MA(1) Model)

E (xt) = c0

Var(xt) = σ2a + θ21σ2a = (1 + θ21)σ2a

The prior discussion applies to general MA(q) models, and we obtain twogeneral properties:

1 The constant term of an MA model is the mean of the series2 The variance of an MA(q) model is

Var(xt) = (1 + θ21 + θ22 + . . .+ θ2q)σ2a



Properties of MA Models: Autocorrelation Function

Assume for simplicity that c0 = 0 for an MA(1) model. Multiplying themodel by xt−l , we have

xt−lxt = xt−lat − θ1xt−lat−1

Taking expectation, we obtain

γ1 = −θ1σ2a and γl = 0, l > 1

Then

ρ0 = 1, ρ1 =−θ1

1 + θ21and ρl = 0, l > 1

Thus, the ACF of an MA(1) model cuts off at lag 1. For the MA(2)model, the autocorrelation coefficients are

ρ1 =−θ1 + θ1θ21 + θ21 + θ22

, ρ2 =−θ2

1 + θ21 + θ22and ρl = 0, l > 2 (23)



Properties of MA Models: Invertibility

Rewriting a zero-mean MA(1) model as at = xt + θ1at − 1, one can userepeated substitutions to obtain

at = xt + θ1xt−1θ21xt−2 + θ31xt−3 + . . . .

Intuitively, θj1should go to 0 as j increases because the remote return xt − jshould have very little impact on the current shock, if any. If |θ1| < 1, the

MA(1) model is said to be invertible. If |θ1| = 1, then the MA(1) model isnoninvertible.



Identifying MA Order

For a time series xt with ACF ρl , if ρq 6= 0, but ρl = 0 for l > q, then xtfollows an MA(q) model.



Identifying MA Order

On the basis of the sample ACF, the following MA(9) model

xt = c0 + at − θ1at−1 − θ3at−3 − θ9at−9

Note that, unlike the sample PACF, sample ACF provides information onthe nonzero MA lags of the model.

Example 25

Consider a simple MA(2) model with θ1 = 0. The model isxt = c0 + at − θ2at − 2. Then, the ACF of the model is

ρ0 = 1, ρ1 = 0, ρ2 =−θ2

1 + θ22and ρj = 0, j > 2

Therefore, for this particular case, ACF provides the exact information onthe structure of the model.



Estimation

Conditional likelihood method: it assumes that the initial shocks (i.e.,at for t ≤ 0) are 0, the shocks needed in likelihood functioncalculation are obtained recursively from the model, starting witha1 = x1 − c0and a2 = x2 − c0 + θ1a1.

Exact likelihood method: it treats the initial shocks at , t ≤ 0 asadditional parameters of the model and estimate them jointly withother parameters.



Forecasting Using MA Models

Assume that the forecast origin is h, and let Fh denote the informationavailable at time h. For the 1-step ahead forecast of an MA(1) process,the model says

xh+1 = c0 + ah+1 − θ1ahTaking the conditional expectation, we have

xh(1) = E (xh+1|Fh) = c0 − θ1aheh(1) = xh+1 − xh(1) = ah+1

For the 2-step ahead forecast from the equation

xh+2 = c0 + ah+2 − θ1ah+1

we have

xh(2) = E (xh+2|Fh) = c0

eh(2) = xh+2 − xh(2) = ah+2 − θ1ah+1




Summary:

The variances of the 1-step and 2-step ahead forecast errors areVar[eh(1)] = σ2a and Var[eh(2)] = (1 + θ2)σ2a , respectively.

For an MA(1) model, the 1-step ahead point forecast at the forecast originh is c0 − θ1ah and the multistep ahead forecasts are c0, which is theunconditional mean of the model.

For MA(1) models, mean reverting only takes one time period.




Similarly, for an MA(2) model, we have

xh+l = c0 + ah+l − θ1ah+l−1 − θ2ah+l−2

from which we obtain

xh(1) = c0 − θ1ah − θ2ah−1xh(2) = c0 − θ2ahxh(l) = c0, for l > 2.

Thus, the multistep ahead forecasts of an MA(2) model go to the mean ofthe series after two steps. The variances of forecast errors go to thevariance of the series after two steps.

In general, for an MA(q) model, multistep ahead forecasts go to the meanafter the first q steps.




R Illustration:

> da=read.table("m-ibm3dx2608.txt",header=T)

> head(da)

> ew=da$ewrtn

> m1=arima(ew,order=c(0,0,9)) # unrestricted model

> m1

> m1=arima(ew,order=c(0,0,9),fixed=c(NA,0,NA,0,0,0,0,0,NA,NA))

> m1

> sqrt(0.005097)

> Box.test(m1$residuals,lag=12,type=’Ljung’) # model checking

> pv=1-pchisq(17.6,9) # compute p-value after adjusting the d.f.

> pv

> m1=arima(ew[1:986],order=c(0,0,9),fixed=c(NA,0,NA,0,0,0,0,0,NA,NA))

> m1

> predict(m1,10) # prediction



Summary

for MA models, ACF is useful in specifying the order because ACFcuts off at lag q for an MA(q) series;

for AR models, PACF is useful in order determination because PACFcuts off at lag p for an AR(p) process;

an MA series is always stationary, but for an AR series to bestationary, all of its characteristic roots must be less than 1 inmodulus;

for a stationary series, the multistep ahead forecasts converge to themean of the series and the variances of forecast errors converge to thevariance of the series as the forecast horizon increases.



Simple ARMA Models

Definition 26

A time series xt follows an ARMA(1,1) model if it satisfies

xt − φ1xt−1 = φ0 + at − θ1at−1, (24)



Properties of ARMA(1,1) Models

Taking expectation of Equation (24), we have

E (xt)− φ1E (xt−1) = φ0 + E (at)− θ1E (at−1).

Because E (ai ) = 0 for all i , the mean of xt is

E (xt) = µ =φ0

1− φ1

Next, assuming for simplicity that φ0 = 0, we consider the autocovariancefunction of xt . First, multiplying the model by at and taking expectation,we have

E (xtat) = E (a2t )− θ1E (atat−1) = E (a2t ) = σ2a . (25)




Rewriting the model as

xt = φ1xt−1 + at − θ1at−1

and taking the variance of the prior equation, we have

Var(xt) = φ21Var(xt − 1) + σ2a + θ21σ2a − 2φ1θ1E (xt−1at−1).

Using Equation (25), we obtain

Var(xt)− φ21Var(xt−1) = (1− 2φ1θ1 + θ21)σ2a .

Therefore, if the series xt is weakly stationary, then Var(xt) =Var(xt−1),and we have

Var(xt) =(1− 2φ1θ1 + θ21)σ2a

1− φ21




To obtain the autocovariance function of xt , we assume that φ0 = 0 andmultiply the model in Equation (24) by xt − l to obtain

xtxt−l − φ1xt−1xt−l = atxt−l − θ1at−1xt−l .

For l = 1, taking expectation and using Equation (25) for t − 1, we have

γ1 − φ1γ0 = −θ1σ2a ,

For l = 2 and taking expectation, we have

γ2 − φ1γ1 = 0,

In fact, the same technique yields

γl − φ1γl−1 = 0, for l > 1. (26)




In terms of ACF, the previous results show that for a stationaryARMA(1,1) model

ρ1 = φ1 −θ1σ

2a

γ0, ρl = φ1ρl−1, for l > 1.

Thus, the ACF of an ARMA(1,1) model behaves very much similar to thatof an AR(1) model except that the exponential decay starts with lag 2.Consequently, the ACF of an ARMA(1,1) model does not cut off at anyfinite lag.

Turning to PACF, one can show that the PACF of an ARMA(1,1) modeldoes not cut off at any finite lag either. It behaves very much similar tothat of an MA(1) model except that the exponential decay starts with lag2 instead of lag 1.



General ARMA Models

Definition 27

A general ARMA(p,q) model is in the form

xt = φ0 +

p∑i=1

φixt−i + at −q∑

i=1

θiat−i

Using the backshift operator, the model can be written as

(1− φ1B − · · · − φpBp)xt = φ0 + (1− θ1B − · · · − θqBq)at , (27)

If all of the solutions of the characteristic equation are less than 1 inabsolute value, then the ARMA model is weakly stationary. In this case,the unconditional mean of the model is E (xt) = φ0/(1− φ1 − · · · − φp).



Identifying ARMA Models

Extended Autocorrelation Function (EACF): If we can obtain a consistentestimate of the AR component of an ARMA model, then we can derivethe MA component. From the derived MA series, we can use ACF toidentify the order of the MA component.




R Illustration:

# EACF table

> da=read.table("m-3m4608.txt",header=T)

> head(da)

> mmm=log(da$rtn+1)

> library(TSA) # Load the package

> m1=eacf(mmm,6,12) # Simplified table

> print(m1$eacf,digits=2)



Forecasting Using an ARMA Model

Definition 28

The 1-step ahead forecast of xh+1 can be easily obtained from the modelas

xh(1) = E (xh+1|Fh) = φ0 +

p∑i=1

φixh+1−i −q∑

i=1

θiah+1−i ,

and the associated forecast error is eh(1) = xh+1 − xh(1) = ah+1. Thevariance of 1-step ahead forecast error is Var[eh(1)] = σ2a .

Definition 29

For the l-step ahead forecast, we have

xh(l) = E (xh+l |Fh) = φ0 +

p∑i=1

φixh(l − i)−q∑

i=1

θiah(l − i),

The associated forecast error is eh(l) = xh+l − xh(l).Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 88 / 153


Three Model Representations for an ARMA Model

Given two polynomials φ(B) = 1−∑p

i=1 φiBi and θ(B) = 1−

∑pi=1 θiB

i ,we can obtain, by long division, that

θ(B)

φ(B)= 1 + ϕ1B + ϕ2B

2 + · · · ≡ ϕ(B) (28)

φ(B)

θ(B)= 1− π1B − π2B2 − · · · ≡ π(B). (29)

For instance, if φ(B) = 1− φ1Band θ(B) = 1− θ1B, then

ϕ(B) =1− θ1B1− φ1B

= 1 + (φ1 − θ1)B + φ1(φ1 − θ1)B2 + φ21(φ1 − θ1)B3 + · · ·

π(B) =1− φ1B1− θ1B

= 1− (φ1 − θ1)B − θ1(φ1 − θ1)B2 − θ21(φ1 − θ1)B3 − · · · .




From the definition, ϕ(B)π(B) = 1. Making use of the fact that Bc = cfor any constant, we have

φ0θ(1)

=φ0

1− θ1 − · · · − θqand

φ0φ(1)

=φ0

1− φ1 − · · · − φp.

Definition 30 (AR Representation)

Using the result of long division in Equation (29), the ARMA(p, q) modelcan be written as

xt =φ0

1− θ1 − · · · − θq+ π1xt−1 + π2xt−2 + π3xt−3 + · · ·+ at . (30)




Definition 31 (Invertibility)

An ARMA(p, q) model is invertible if the π-weights (coefficient {πi})decay to 0 as i increases.

The sufficient condition for invertibility is that all the zeros of thepolynomial θ(B) are greater than unity in modulus.

Example 32

For a pure AR model, θ(B) = 1 so that π(B) = φ(B), which is afinite-degree polynomial. Thus, πi = 0 for i > p, and the model isinvertible.

Consider the MA(1) model xt = (1− θ1B)at . The zero of the first-orderpolynomial 1− θ1B is B = 1/θ1. Therefore, an MA(1) model is invertibleif |1/θ1| > 1. This is equivalent to |θ1| < 1.




Definition 33 (MA Representation)

Using the result of long division in Equation (29), the ARMA(p, q) modelcan be written as

xt =φ0

1− φ1 − · · · − φp+ at +ϕ1at−1 +ϕ2at−2 + · · · = µ+ϕ(B)at . (31)

At the forecast origin h , we have the shocks ah, ah−1, . . .. Therefore, thel-step ahead point forecast is

xh(l) = µ+ ϕlah + ϕl+1ah−1 + · · · , (32)

and the associated forecast error is

eh(l) = ah+l + ϕ1ah+l−1 + · · ·+ ϕl−1ah+1,

Consequently, the variance of l-step ahead forecast error is

Var[eh(l)] = (1 + ϕ21 + · · ·+ ϕ2

l−1)σ2a , (33)




Proof of the mean reversion of a stationary time series.

The stationarity implies that ϕi approaches 0, as i →∞. Therefore, byEquation (32), we have xh(l)→ µ, as l →∞. Because xh(l) is theconditional expectation of xh+l at the forecast origin h, the result says thatin the long term, the return series is expected to approach its mean, thatis, the series is mean reverting.

Using the MA representation in Equation (31), we haveVar(xt) = (1 +

∑∞i=1 ϕ

2i )σ2a . Consequently, by Equation (33), we have

Var[eh(l)]→Var(xt), as l →∞. The speed by which xh(l) approaches µdetermines the speed of mean reverting.



Unit-Root Nonstationarity

Definition 34 (Random Walk)

A time series {pt} is a random walk if it satisfies

pt = pt−1 + at (34)

If we treat the random walk model as a special AR(1) model, then thecoefficient of pt−1 is unity, which does not satisfy the weak stationaritycondition of an AR(1) model. A random walk series is, therefore, notweakly stationary, and we call it a unit-root nonstationary time series.

For any forecast horizon l > 0, we have ph(l) = ph (why?). Thus, for allforecast horizons, point forecasts of a random walk model are simply thevalue of the series at the forecast origin. Therefore, the process is notmean reverting.




The MA representation of the random walk model in Equation (34) is

pt = at + at−1 + at−2 + · · · .

1 The l-step ahead forecast error is eh(l) = ah+l + · · ·+ ah+1, so thatVar[eh(l)] = lσ2a , which diverges to infinity as l →∞. Thus, theusefulness of point forecast ph(l) diminishes as l increases.

2 The adequacy of a random walk model for market indexes isquestionable.

3 From the representation, ϕi = 1, for all i . Thus, the impact of anypast shock at−i on pt does not decay over time. Consequently, theseries has a strong memory as it remembers all of the past shocks.




Definition 35 (Random Walk with Drift)

The log return series of a market index tends to have a small and positivemean. This implies that the model for the log price is

pt = µ+ pt−1 + at (35)

The constant term µ represents the time trend of the log price pt and isoften referred to as the drift of the model.

pt = tµ+ p0 +t∑

i=1

ai

Because Var(∑t

i=1 ai ) = tσ2a , the conditional standard deviation of pt is√tσa, which grows at a slower rate than the conditional expectation of pt .




Definition 36 (Trend-Stationary Time Series)

A closely related model that exhibits linear trend is the trend-stationarytime series model,

pt = β0 + β1t + xt

where xt is a stationary time series, for example, a stationary AR(p) series.Here, pt grows linearly in time with rate β1 and hence can exhibit behaviorsimilar to that of a random walk model with drift.

The trend-stationary model assumes the mean E (pt) = β0 + β1t , whichdepends on time, and variance Var(pt) = Var(xt), which is finite and timeinvariant.




Definition 37 (ARIMA Model)

A time series yt is said to be an ARIMA(p, 1, q) (autoregressive integratedmoving average) process if the change series ct = yt − yt−1 = (1− B)ytfollows a stationary and invertible ARMA(p, q) model.

Definition 38 (unit-root testing problem)

To test whether the log price pt of an asset follows a random walk or arandom walk with drift, we employ the models

pt = φ1pt−1 + et (36)

pt = φ0 + φ1pt−1 + et (37)

Consider the null hypothesis H0 : φ1 = 1 versus the alternative hypothesisHa : φ1 < 1.




A convenient test statistic is the t ratio of the LS estimate of φ1 under thenull hypothesis. For Equation (36), the LS method gives

φ1 =

∑Tt=1 pt−1pt∑Tt=1 p

2t−1

, σ2a =

∑Tt=1 (pt − φ1pt−1)2

T − 1,

where p0 = 0 and T is the sample size. The t ratio is

DF ≡ t − ratio =φ1 − 1

std(φ1)=

∑Tt=1 pt−1et

σe

√∑Tt=1 p

2t−1

which is commonly referred to as the Dickey-Fuller test.




To verify the existence of a unit root in an AR(p) process, one mayperform the test H0 : β = 1 versus Ha : β < 1 using the regression

xt = ct + βxt−1 +

p−1∑i=1

φi∆t−i + et , (38)

where ct is a deterministic function of the time index t andxj = ∆xj − xj−1 is the differenced series of xt . In practice, ct can be 0 or a

constant or ct = ω0 + ω1t. The t-ratio of β − 1,

ADF-test =β − 1

std(β)

is the well-known augmented Dickey-Fuller unit-root test.




R Illustration:# Unit root

> library(fUnitRoots)

> da=read.table("d-sp55008.txt",header=T)

> sp5=log(da[,7])

> m2=ar(diff(sp5),method=’mle’) # Based on AIC

> m2$order

> adfTest(sp5,lags=2,type=("ct"))

> adfTest(sp5,lags=15,type=("ct")) # Based on PACF

> dsp5=diff(sp5)

> tdx=c(1:length(dsp5))

> m3=arima(dsp5,order=c(2,0,0),xreg=tdx)

> m3

> m3$coef

> sqrt(diag(m3$var.coef))

> tratio=m3$coef/sqrt(diag(m3$var.coef))# compute t-ratio

> tratioXi Chen ([email protected]) Financial Data Analysis March 6, 2017 103 / 153



R Illustration:

# Unit-root test

> library(fUnitRoots)

> da=read.table("q-gdp4708.txt",header=T)

> gdp=log(da[,4])

> m1=ar(diff(gdp),method=’mle’)

> m1$order

> adfTest(gdp,lags=10,type=c("c"))



Exponential Smoothing

Idea: Under the general belief that the serial dependence of xt decaysexponentially, one can use a weighted average of the past data to predictxh+1 with weights decaying exponentially.

Definition 39

Specifically, one employs a quantity as

xh+1 ∝ wxh + x2xh−1 + w3xh−2 + · · · =∞∑j=1

w jxh+1−j ,

Using properties of a geometric series, it is easy to see that∑∞j=1 w

i = 11−w . Therefore, a proper way to use weighted average is

xh(1) = (1− w)[wxh + w2xh−1 + w3xh−2 + · · · ]. (39)

This technique to produce forecasts is called the exponential smoothing.




Exponential smoothing is a special case of the ARIMA models.Specifically, consider the ARIMA(0,1,1) model

(1− B)xt = (1− θB)at

where θ ∈ (0, 1). Using the AR representation in Section 2.6.5, this modelimplies that

xh+1 = (1− θ)[θxh + θ2xh−1 + · · · ] + ah+1.

Therefore, the 1-step ahead forecast is

xh(1) = (1− θ)[θxh + θ2xh−1 + · · · ]





Since only the lag-1 ACF is significantly different from 0 at the 5% level,an MA(1) model is identified for the differenced series. Let xt = ln(VIXt).The fitted model is

(1− B)xt = (1− 0.163B)at , σ2a = 0.0044.

The Ljung-Box statistics of the residuals shows that the fittedARIMA(0,1,1) model is adequate. For instance, we have Q(10) = 14.25with p-value 0.11, based on a chi-squared distribution with 9 degrees offreedom. Consequently, in this particular instance, one can employ theexponential smoothing to predict the log series of daily VIX index.




R Illustration:

> da=read.table(‘‘d-vix0810.txt’’,header=T)

> vix=log(da$Close)

> length(vix)

> m1=arima(vix,order=c(0,1,1))

> m1

> Box.test(m1$residuals,lag=10,type=’Ljung’)

> pp=1-pchisq(14.25,9)

> pp



Seasonal Models



Seasonal Models

The log earnings are denoted by xt . The series of quarterly log earningshas strong serial correlations. A conventional method to handle suchstrong serial correlations is to consider the first differenced series of xt(i.e., ∆xt = xt − xt−1 = (1− B)xt).

We take a seasonal difference of xt to handle the strong seasonal pattern.Specifically, we consider

∆4(∆xt) = (1− B4)∆xt = xt − xt−4 = xt − xt−1 − xt−4 + xt−5.

The operation ∆4 = (1− B4) is called a seasonal differencing.

Definition 41

In general, for a seasonal time series yt with periodicity s, seasonaldifferencing means

∆syt = yt − yt−s = (1− Bs)yt .



Seasonal Models



Seasonal Models

Definition 42

A simple multiplicative seasonal model assumes the form

(1− B)(1− Bs)xt = (1− θB)(1−ΘBs)at , (40)

where s is the periodicity of the series, at is a white noise series, |θ| < 1,and |Θ| < 1. This model is referred to as the airline model.

Focusing on the MA part

wt = (1− θB)(1−ΘBs)at = at − θat−1 −Θat−s + θΘat−s−1,

where wt = (1− Bs)(1− B)xt and s > 1.



Seasonal Models

It is easy to obtain that E (wt) = 0 and

Var(wt) = (1 + θ2)(1 + Θ2)σ2a

Cov(wt ,wt−1) = −θ(1 + Θ2)σ2a

Cov(wt ,wt−s+1) = θΘσ2a

Cov(wt ,wt−s) = −Θ(1 + θ2)σ2a

Cov(wt ,wt−s−1) = θΘσ2a

Cov(wt ,wt−l) = 0, for l 6= 0, 1, s − 1, s, s + 1.

Consequently, the ACF of the wt series is given by

ρ1 =−θ

1 + θ2, ρs =

−Θ

1 + Θ2, ρs−1 = ρs+1 = ρ1ρs =

θΘ

(1 + θ2)(1 + Θ2),

and ρl = 0 for l > 0 and l 6= 1, s − 1, s, s + 1.



Seasonal Models

The airline model can be rewritten as

1− B

1− θB(

1− Bs

1−ΘBsxt) = at .

Let yt = (1− Bs)/(1−ΘBs)xt . Then, we have

(1− B)yt = (1− θB)at , (1− Bs)xt = (1−ΘBs)yt .

Thus, the airline model can be regarded as an exponential smoothingmodel on top of another exponential smoothing model. One exponentialsmoothing is for the usual serial dependence, whereas the other one is forthe seasonal dependence.



Seasonal Models



Seasonal Models

R Illustration:

### seasonal models

> da=read.table("q-ko-earns8309.txt",header=T)

> head(da)

> eps=log(da$value)

> koeps=ts(eps,frequency=4,start=c(1983,1))

> plot(koeps,type=’l’)

> points(koeps,pch=c1,cex=0.6)

> par(mfcol=c(2,2))

> koeps=log(da$value)

> deps=diff(koeps)

> sdeps=diff(koeps,4)

> ddeps=diff(sdeps)

> acf(koeps,lag=20); acf(deps,lag=20)

> acf(sdeps,lag=20); acf(ddeps,lag=20)



Seasonal Models

R Illustration:

# Obtain time plots

> c1=c("2","3","4","1")

> c2=c("1","2","3","4")

> par(mfcol=c(3,1))

> plot(deps,xlab=’year’,ylab=’diff’,type=’l’)

> points(deps,pch=c1,cex=0.7)

> plot(sdeps,xlab=’year’,ylab=’sea-diff’,type=’l’)

> points(sdeps,pch=c2,cex=0.7)

> plot(ddeps,xlab=’year’,ylab=’dd’,type=’l’)

> points(ddeps,pch=c1,cex=0.7)



Seasonal Models

R Illustration:

# Estimation

> m1=arima(koeps,order=c(0,1,1),

seasonal=list(order=c(0,1,1),period=4))

> m1

> tsdiag(m1,gof=20) # model checking

> Box.test(m1$residuals,lag=12,type=’Ljung’)

> pp=1-pchisq(13.30,10)

> pp

> koeps=log(da$value)

> length(koeps)

> y=koeps[1:100]

> m1=arima(y,order=c(0,1,1),


> m1



Seasonal Models

R Illustration:

#Prediction

> pm1=predict(m1,7)

> names(pm1)

> pred=pm1$pred

> se=pm1$se

#Anti-log transformation

> ko=da$value

> fore=exp(pred+se^2/2)

> v1=exp(2*pred+se^2)*(exp(se^2)-1)

> s1=sqrt(v1)

> eps=ko[80:107]

> length(eps)



Seasonal Models

R Illustration:

> tdx=(c(1:28)+3)/4+2002

> upp=c(ko[100],fore+2*s1)

> low=c(ko[100],fore-2*s1)

> min(low,eps)

> max(upp,eps)

> plot(tdx,eps,xlab=’year’,ylab=’earnings’,type=’l’,

ylim=c(0.35,1.3))

> points(tdx[22:28],fore,pch=’*’)

> lines(tdx[21:28],upp,lty=2)

> lines(tdx[21:28],low,lty=2)

> points(tdx[22:28],ko[101:107],pch=’o’,cex=0.7)



Seasonal Models

Definition 43 (dummy variable)

When the seasonal pattern of a time series is stable over time (e.g., closeto a deterministic function), dummy variables may be used to handle theseasonality. By seasonal dummy variables, we mean the indicator variablesfor the seasons within a year. For quarterly data, the dummy variablesrepresent spring, summer, autumn and winter, respectively, and three ofthem are used in an analysis.



Seasonal Models




Seasonal Models

R Illustration:

> da=read.table("m-deciles08.txt",header=T)

> d1=da[,2]

> jan=rep(c(1,rep(0,11)),39) # Create January dummy.

> m1=lm(d1~jan)

> summary(m1)

> m2=arima(d1,order=c(1,0,0),


> m2

> tsdiag(m2,gof=36) # plot not shown.

> m2=arima(d1,order=c(1,0,0),

seasonal=list(order=c(1,0,1), period=12),include.mean=F)

> m2



Regression Models with Time Series Errors

In many applications, the relationship between two time series is of majorinterest.

Market Model: relates the excess return of an individual stock to thatof a market index.

Term structure: the time evolution of the relationship betweeninterest rates with different maturities is investigated.

yt = α + βxt + et (41)

If {et} is a white noise series, then the LS method produces consistentestimates. In practice, however, it is common to see that the error term etis serially correlated. In this case, we have a regression model with timeseries errors, and the LS estimates of α and β may not be consistent.




A naive way to describe the relationship between the two interest rates isto use the simple model x3t = α + βx1t + +et . This results in a fittedmodel

x3t = 0.832 + 0.930x1t + et , σe = 0.523 (42)

Model confirms the high correlation between the two interest rates.However, the model is seriously inadequate as shown by following figure,The behavior of the residuals suggests that marked differences existbetween the two interest rates.




The unit-root behavior of both interest rate series and the residuals ofEquation (42) leads to the consideration of the change series of interestrates. Let

1 c1t = x1t − x1,t−1 = (1− B)x1t for t ≥ 2: changes in the 1-yearinterest rate;

2 c3t = x3t − x3,t−1 = (1− B)x3t for t ≥ 2: changes in the 3-yearinterest rate,

and consider the linear regression c3t = βc1t + et . The change seriesremain highly correlated with a fitted linear regression model given by

c3t = 0.792c1t + et , σe = 0.0690, (43)




Idea: We employ a simple time series model discussed in this chapter forthe residual series and estimate the whole model jointly.

Example 45

Consider the simple linear regression in Equation (43), we specify anMA(1) model for the residuals and modify the linear regression model to

c3t = βc1t + et , et = at − θ1at−1, (44)

If the time series model used is stationary and invertible, then one canestimate the model jointly via the maximum likelihood method. For theUS weekly interest rate data, the fitted version of model (Eq44) is

c3t = 0.794c1t + et , et = at − 0.18231at−1, σa = 0.0678. (45)




Comparing the models in Equations (42), (43), and (45), we make thefollowing observations.

1 The high R2 96.5% and coefficient 0.930 of model (Eq. 42) aremisleading because the residuals of the model show strong serialcorrelations.

2 For the change series, R2 and the coefficient of c1t of models (Eqs.43 and 45) are close. In this particular instance, adding the MA(1)model to the change series provides only a marginal improvement.Because the estimated MA coefficient is small numerically, eventhough it is statistically highly significant.

3 The analysis demonstrates that it is important to check residual serialdependence in linear regression analysis.




The general procedure for analyzing linear regression models with timeseries errors is as follows:

1 Fit the linear regression model and check serial correlations of theresiduals.

2 If the residual series is unit-root nonstationary, take the first differenceof both the dependent and explanatory variables. Go to step 1. If theresidual series appears to be stationary, identify an ARMA model forthe residuals and modify the linear regression model accordingly.

3 Perform a joint estimation via the maximum likelihood method andcheck the fitted model for further improvement.




R Illustration:# regression models with time series errrors

> r1=read.table("w-gs1yr.txt",header=T)[,4]

> r3=read.table("w-gs3yr.txt",header=T)[,4]

> m1=lm(r3~r1)

> summary(m1)

> plot(m1$residuals,type=‘l’)

> acf(m1$residuals,lag=36)

> c1=diff(r1)

> c3=diff(r3)

> m2=lm(c3~-1+c1)

> summary(m2)

> acf(m2$residuals,lag=36)

> m3=arima(c3,order=c(0,0,1),xreg=c1,include.mean=F)

> m3

> rsq=(sum(c3^2)-sum(m3$residuals^2))/sum(c3^2)

> rsqXi Chen ([email protected]) Financial Data Analysis March 6, 2017 140 / 153


Long-Memory Models

Definition 46

If the ACF decays slowly to 0 at a polynomial rate as the lag increasesthen process is referred to as long-memory time series.

One such example is the fractionally differenced process defined by

(1− B)dxt = at ,−0.5 < d < 0.5, (46)


Consider the daily absolute returns of the value-weighted index of CRSPfrom January 2, 1970 to December 31, 2008. If we employ anAFRIMA(1,d ,1) model and use the maximum likelihood method, then weobtain the fitted model

(1− 0.113B)(1− B)0.491yt = (1− 0.576B)at ,



Long-Memory Models



Long-Memory Models

R Illustration:

# Long memory

> library(fracdiff)

> da=read.table("d-ibm3dx7008.txt",header=T)

> head(da)

> ew=abs(da$vwretd)

# obtain Geweke-Port-Hudak estimate using command fdGPH

> m3=fdGPH(ew)

> m3

> m2=fracdiff(ew,nar=1,nma=1)

> summary(m2)



Model Comparison and Averaging

In-sample Comparison

The objective of data analysis is to gain insight into the dynamicstructure of a time series.

All data are used in model estimation and comparison.

Information criteria, such as AIC and BIC, and the estimate ofresidual variance can be used for model comparison. For a selectedcriterion, the model with a smaller value is preferred.

Out-of-sample Comparison

The objective of time series modeling is forecasting.

To quantify the forecasting performance of statistical models is themean square of forecast errors (MSFE).



Out-of-sample Comparison I

Backtesting

1 Divide the data set into estimation and forecasting subsamples. Thereis no specific rule to guide the division, but each subsample shouldcontain sufficient data points so that the estimation and MSFE canbe as accurate as possible.

2 Perform model estimation using data in the estimation subsample anduse the fitted model to obtain 1-step ahead forecast and its forecasterror. Specifically, suppose the estimation subsample is{xt |t = 1, . . . , h}. We estimate the model using the first h datapoints to compute the 1-step ahead prediction xh(1) and its forecasterror eh(1) = xh+1 − xh(1). The data point xh+1 is not used in modelestimation.



Out-of-sample Comparison II

3 Advance the estimation subsample by one data point, that is,{xt |t = 1, . . . , h + 1}. Reestimate the model using h + 1 data pointsand compute the 1-step ahead forecast and its forecast error. That is,compute eh+1(1) = xh+2− xh+1(1), where xh+1(1) is the 1-step aheadprediction of the newly fitted model at the forecast origin h + 1.

4 Repeat step 3 until we have the 1-step ahead forecast erroreT−1(1) = xT − xT−1(1), where T is the sample size.

The MSFE of the model is then given by

MSFE(m) =

∑T−1j=h [ej(1)]2

T − h,

where m denotes the model used. One selects the model with the smallestMSFE as the best model for the data.



Example 2.9


Consider the US quarterly real GDP from the first quarter of 1947 to thesecond quarter of 2010. The GDP data are obtained from the FederalReserve Bank at St Loius, in billions of chained 2005 dollars, andseasonally adjusted. The PACF suggests an AR(3) model for the data.The fitted model is

(1−0.346B−0.13B2+0.123B3)(xt−0.0079) = at , σ2a = 8.32×10−5, (47)

The AIC of the fitted AR(3) model is -1648.45.



Example 2.9

Example 49 (Example 2.9 Con’d)

Since the data are seasonally adjusted, we also entertain a seasonal model,

(1−0.331B−0.152B2+0.11B3)(1−0.497B4)(xt−0.0079) = (1−0.587B4)at ,(48)

The AIC of model (Eq. 48) is -1646.93.

For in-sample comparison, AIC selects the AR(3) model. For out-samplecomparison, we apply the backtesting procedure with initial forecast originbeing the fourth quarter of 2000, so that there are 38 quarters in theforecasting subperiod. For 1-step ahead prediction, the root mean squareof forecast errors for the AR(3) and seasonal models are 0.00615 and0.00632, respectively. Again, the AR(3) model is preferred.



Long-Memory Models

R Illustration:# model comparison

> da=read.table("q-gdpc96.txt",header=T)

> head(da)

> gdp=log(da$gdp)

> dgdp=diff(gdp)

> m1=ar(dgdp,method=’mle’)

> m1$order

> m2=arima(dgdp,order=c(3,0,0))

> m2

> m3=arima(dgdp,order=c(3,0,0),

season=list(order=c(1,0,1),period=4))

> m3

> source("backtest.R") # Perform backtest

> mm2=backtest(m2,dgdp,215,1)

> mm3=backtest(m3,dgdp,215,1)Xi Chen ([email protected]) Financial Data Analysis March 6, 2017 152 / 153


Model Averaging

Definition 50

When several models fit a given time series well, instead of selecting asingle model, one can use all the models to produce a combined forecast.This technique is referred to as model averaging.

Let xi ,h+1 be the 1-step ahead forecast of model i at the forecast origin h.Then, a combined forecast is

xh+1 =m∑i=1

wi xi ,h+1,

where w is a nonnegative real number denoting the weight for model i andsatisfies

∑mi=1 wi = 1.


financial data analysis - beijing foreign studies university · linear models for financial time...

Documents