laplace likelihood and lad estimation for non-invertible ma(1)rdavis/lectures/nber_05.pdf ·...

28
1 Heidelberg 9/05 Laplace Laplace Likelihood and LAD Estimation Likelihood and LAD Estimation for Non for Non - - invertible MA(1) invertible MA(1) Colorado State University National Tsing-Hua University U. of California, San Diego (http://www.stat.colostate.edu/~rdavis/lectures) Richard A. Davis F. Jay Breidt Nan-Jung Hsu Murray Rosenblatt

Upload: duongkien

Post on 30-May-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

1Heidelberg 9/05

LaplaceLaplace Likelihood and LAD Estimation Likelihood and LAD Estimation for Nonfor Non--invertible MA(1)invertible MA(1)

Colorado State UniversityNational Tsing-Hua University

U. of California, San Diego

(http://www.stat.colostate.edu/~rdavis/lectures)

Richard A. Davis

F. Jay Breidt

Nan-Jung Hsu

Murray Rosenblatt

2Heidelberg 9/05

ProgramIntroduction

The MA(1) unit root problemWhy study non-invertible MA(1)’s?

over-differencingrandom walk + noise

Gaussian Likelihood EstimationIdentifiabilityLimit results Extensions

non-zero meanheavy tails

Laplace Likelihood/LAD estimationJoint and exact likelihoodLimit resultsLimit distribution/simulation comparisonsPile-up probabilities

joint likelihoodexact likelihood

3Heidelberg 9/05

MA(1) unit root problem

MA(1):

Yt = Zt − θ Zt-1 , {Zt} ~ IID (0, σ2)

Properties:

• |θ| < 1 ⇒ (invertible)

• |θ| > 1 ⇒ (non-invertible)

• |θ| = 1 ⇒ and

⇒ (perfect interpolation)

• |θ| < 1 ⇒

MLE = maximum (Gaussian) likelihood, n = sample size

What if θ =1?

∑∞

=−θ=

0

j

jjtt YZ

∑∞

=+θ−=

1

j-

jjtt YZ

},...,,{sp },...,,{sp 211 ++− ∈∈ tttttt YYZYYZ

00}0 ,sp{P YYsYs=≠

)/)1(,(AN is ˆ 2 nmle θ−θθ

4Heidelberg 9/05

Why study non-invertible MA(1)?

a) over-differencing

• linear trend model: Xt = a + bt + Zt .Yt = Xt − Xt-1 = b + Zt − Zt-1 ~ MA(1) with θ = 1.

• seasonal model: Xt = st + Zt , st seasonal component w/ period 12.Yt = Xt − Xt-12 = Zt − Zt-12 ~ MA(12) with θ = 1.

b) random walk + noise

Xt = Xt-1 + Ut (random walk signal)

Yt = Xt + Vt (random walk signal + noise)

ThenYt − Yt-1 = Ut + Vt − Vt-1 ~MA(1)

with θ=1 if and only if Var(Ut) = 0.

5Heidelberg 9/05

Identifiability and Gaussian likelihoodIdentifiability

• |θ| > 1 ⇒ Yt = εt – θ−1 εt-1 , where {εt} ~ WN(0,θ2σ2).

• {εt} is IID if and only if {Zt} is Gaussian (Breidt and Davis `91)

Gaussian Likelihood

LG(θ, σ2) = LG(1/θ, θ2σ2) ⇒ θ is only identifiably for |θ| ≤ 1.Notes:i) this implies LG(θ) = LG(1/θ) for the profile likelihood and θ = 1 is a

critical point,

ii) a pile-up effect ensues, i.e.,

even if θ < 1.

.0)1(L'G =

0)1ˆ( >=θP

6Heidelberg 9/05

Gaussian likelihood examples

theta

Gau

ssia

n lik

elih

ood

0.5 1.0 1.5 2.0

1.20

1.25

1.30

1.35

1.40

100 observations from Yt = Zt − θ0 Zt-1 , {Zt} ~ IID (0, σ2), Laplace pdf

θ0 =1.0θ0 =.8 θ0 =1.25

7Heidelberg 9/05

MLE (Gaussian likelihood)

Idea: build parameter normalization into the likelihood function.

Model: Yt = Zt - (1-β/n) Zt-1 , t =1,…,n.

β = n(1-θ), θ = 1- β/n, θ0 = 1- γ/n

Gaussian Likelihood:Ln(β) = ln (1- β/n) - ln (1), ln ( ) = profile log-like.

Theorem (Davis and Dunsmuir `96): Under θ0 = 1-γ / n,Ln(β) →d Zγ (β) on C[0,∞).

Results:

• argmax Ζγ(β)

• arglocalmax Ζγ(β)

• if γ = 0.

=β→θ− ˆ)ˆ1( lmlmn

6518.)0ˆP()1ˆP( ==β→=θ lmlm

=β→θ− mlemlen ˆ)ˆ1(

8Heidelberg 9/05

Extensions of MLE (Gaussian likelihood)

ii) heavy tails (Davis and Mikosch `98): {Zt} symmetric alpha stable

(SαS). Then the max Gaussian likelihood estimator has the same

normalizing rate, i.e.,

The pile-up decreases with increasing tail heaviness.

i) non-zero mean (Chen and Davis `00): same type of limit, except pile-up is more excessive.

This makes hypothesis testing easy!

Reject H0: θ = 1 if (size of test is .045)

lmdlmn β→θ− ˆ)ˆ1(

)0ˆP()1ˆP( =β→=θ lmlm

.955)1ˆ(P →=θmle

1ˆ <θmle

9Heidelberg 9/05

Comparison of limit cdf’s for different α’s

-20 -15 -10 -5 0

0.00.1

0.20.3

0.4

α = 2.0

α = 1.5

α = 1.0

α = .75

10Heidelberg 9/05

Laplace likelihood/LAD estimation If noise distribution is non-Gaussian, the MA(1) parameter θ is identifiable for all real values.

Q1. For MLE (non-Gaussian) does one have n or n1/2 asymptotics?

Q2. Is there a pile-up effect?

11Heidelberg 9/05

Laplace likelihood – joint and exactModel. Yt = Zt − θ Zt-1 , {Zt} ~ IID (0, σ2) with median 0 and EZ4 < ∞.Initial variable.

Joint density: Let Yn=(Y1, . . . ,Yn), then

where the zt are solved

⎩⎨⎧

−≤θ

= ∑ =

n

t tninit YZ

ZZ

1

0

otherwise. ,,1||if ,

( ),1||1),...,,(),( }1|{|}1|{|10 >θ−

≤θ θ+= nninitn zzzfzf y

forward by: zt = Yt + θ zt-1, t =1,…,n for |θ| ≤ 1 with z0 = zinit

backward by: zt-1 = θ−1(zt – Yt ), t =n,…,1 for |θ| > 1 with zn = zinit+ Y1+. . .+ Yn

Note: integrate out zinit to get exact likelihood.

initinitnn dzzff ∫∞

∞−

= ),()( yy

12Heidelberg 9/05

Laplace likelihood examples

100 observations from Yt = Zt − θ0 Zt-1 , {Zt} ~ IID (0, σ2), Laplace pdf

0.5

1

1.5

2

theta

-2

-1

0

1

2

z.init 0

0.2

0.4

0.6

0.8

1Z

0.5

1

1.5

2

theta

-2

-1

0

1

2

z.init

00.

20.

40.

60.

81

Z

θ0 =.8

θ0 =1.0

13Heidelberg 9/05

Laplace likelihood examples (cont)

100 observations from Yt = Zt − θ0 Zt-1 , {Zt} ~ IID (0, σ2), Laplace pdf

θ0 =1.0θ0 =.8 θ0 =1.25

Exact likelihood Joint likelihood at zmax(θ)

theta

Lapl

ace

likel

ihoo

d

0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

Lapl

ace

likel

ihoo

d

0.5 1.0 1.5 2.0

0.0

0.02

0.04

0.06

0.08

0.10

15Heidelberg 9/05

Laplace likelihood-LAD estimation

(Joint) Laplace log-likelihood.

Maximizing wrt σ, we obtain

so that maximizing L is equivalent to minimizing

∑=

>θ− θ−σ−σ+−=σθ

n

ttinit nznzL

0}1|{|

1 1|)|(log||2log)1(),,(

∑=

+=σn

tt nz

0

)1/(||ˆ

⎪⎪⎩

⎪⎪⎨

θ

≤θ=θ

=

=n

tt

n

tt

initn

z

zzl

0

0

otherwise. |,|||

1, || if |,|),(

16Heidelberg 9/05

Joint Laplace likelihood — limit results

Theorem 1. Under the parameterizations, θ = 1 + β/n and zinit = Z0 + ασ/n1/2,

we have

on C(R2), where

for β ≤ 0, and

for β > 0, in which S(t) and W(t) are the limits of the partial sum processes

),()),1(),((),( 01 αβ→−θσ=αβ − UZlzlU dninitnn

dsetdSef

sdWetdSeU

ssts

ssts

n

21

0 0

)(

1

0 0

)(

)()0(

)()(),(

∫ ∫

∫ ∫

⎟⎟⎠

⎞⎜⎜⎝

⎛α+β+

⎟⎟⎠

⎞⎜⎜⎝

⎛α+β=αβ

β−β

−β−β

)()()0(

)()(),(

21

0

1)1()(

1

0

1)1()(

sdWetdSef

sdWetdSeU

s

sts

s

stsn

∫ ∫

∫ ∫

⎟⎟⎠

⎞⎜⎜⎝

⎛α+β+

⎟⎟⎠

⎞⎜⎜⎝

⎛α+β−=αβ

−β−−β

+

−β−−β

17Heidelberg 9/05

Joint Laplace likelihood — limit results

From the limit,

it follows that

where

.)(1)( ),(1)( d

][

0

][

0

W(t)Zsignn

tWtSZn

tSnt

iind

nt

iin →

σ=→

σ= ∑∑

==

),(),( αβ→αβ UU dn

)ˆ,ˆ())ˆ(),1ˆ(( 01

lmlmdLinitlm Zznn αβ→−σ−θ −

).,( minarg(local) )ˆ,ˆ( αβ=αβ Ulmlm

18Heidelberg 9/05

Exact Laplace likelihood — limit results

Exact Laplace Likelihood:

Theorem 2. For the MLE , we have

where

and U*(β) is a stochastic process defined in terms of S(t) and W(t).

In addition,

∫∞

∞−

=σθ initinitnn dzzfL ),(),( y

),,~(|))|~(),1~(( 0 NZEnn mledmlemle β→−σ−θ

~ ,~nn σθ

|)),var(|N(0,~N , )( min arg~0

* ZUmle β=β

).( min (local) arg~ ,~)1~( * β=ββ→−θ Un lmlmdlm

19Heidelberg 9/05

Simulating from the limit process

Step 1. Simulate two indep sequences (W1, . . . , Wm) and (V1, . . . , Vm) of iid N(0,1) random variables with m=100000.

.100000/)( and 100000/)(]100000[

1

]100000[

1∑∑

==

==t

jj

t

jj VtVWtW

./)(Var and /|| 21

221 cZcZEc tt −σ=σ=

Step 5. Determine the respective local and global minimizers of U(β,α) and U*(β) numerically.

Step 2. Form W(t) and V(t) by the partial sum processes,

Step 3. Set S(t) = c1W(t) + c2V(t), where

Limit process depends only on c1, c2, and f (0).

Step 4. Compute U(β,α) and U*(β) from the definition.

20Heidelberg 9/05

Limit process

2 realizations of the limit processes, U(β,α(β)), U*(β).

Laplace pdf t(5) pdf

beta

U a

nd U

star

-20 -10 0 10 20

02

46

810

12

beta

U a

nd U

star

-20 -10 0 10 20

-20

24

21Heidelberg 9/05

Limit distribution

beta.min

pdf

-20 -10 0 10 20 30

0.0

0.05

0.10

0.15

beta.min

pdf

-20 -10 0 10 20

0.0

0.05

0.10

0.15

0.20

0.25

0.30

Joint Lap Likelihood Exact Lap Likelihood

ˆ)1ˆ( lmdlmn β→−θ

red graph = Laplace pdf for Zt

blue graph = Gaussian pdf for Zt

ˆ)1ˆ( lmdlmn β→−θ ~)1~( lmdlmn β→−θ ~)1~( lmdlmn β→−θ

22Heidelberg 9/05

Limit cdf

beta.min

cdf

-20 -10 0 10 20

0.0

0.2

0.4

0.6

0.8

1.0

Joint Lap Like

red graph = Laplace pdf for Zt blue graph = Gaussian pdf for Zt

beta.min

cdf

-20 -10 0 10 200.

00.

20.

40.

60.

81.

0

Exact Lap Like

24Heidelberg 9/05

.0000 .0004 .0293

.0574 .0208 .1511

.0574 .0208 .1539

.0483 .0211 .1414

biass.d.rmseasymp

n = 50

-.0057 -.0033 -.0208.1438 .0656 .2430.1439 .0657 .2438.1207 .0526 .2236

biass.d.rmseasymp

n = 20

.0005 -.0003 -.0025

.0303 .0107 .1000

.0303 .0107 .1000

.0241 .0105 .1000

biass.d.rmseasymp

n = 100

.0005 .0000 -.0016

.0140 .0058 .0718

.0141 .0058 .0718

.0121 .0053 .0707

biass.d.rmseasymp

n = 200

Exact Jointn ˆ ˆ ~

σθθ lmlm

Laplace noise

θ = 1, σ =1

1000 reps

25Heidelberg 9/05

-.013 .002 -.026.096 .078 .171

biasrmse

n = 50

-.047 -.050 -.057.224 .213 .297

biasrmse

n = 20

.003 -.003 -.009

.051 .034 .105biasrmse

n = 100

.000 .000 .007

.028 .014 .070biasrmse

n = 200

Exact Joint Condn ˆ ~

mlmlml θθθLaplace noise

θ = 1, σ =1

1000 reps

Simulation results

Exact = MLE

Joint = maximize over θ and zinit

Cond = maximize over θ conditional on zinit= 0

Note:

• LM dominates ML

• joint dominates exact (rmse is half the size)

26Heidelberg 9/05

Pile-up probabilities

Theorem 3. (joint Laplace likelihood)

, where

Idea:

Now,

and the result follows.

))()( 0(P)1ˆ(P1

0∫<<→=θ sdWsdSYlm

)0))(ˆ,( lim and 0))(ˆ,( (limP)1ˆ(P 00 >βαββ∂∂

<βαββ∂∂

==θ ↓β↑β nnlm UU

)0))(ˆ,( lim and 0))(ˆ,( (limP 00 >βαββ∂∂

<βαββ∂∂

→ ↓β↑β UU

∫−=βαββ∂∂

↑β

1

00 )()())(ˆ,( lim sdWsdSYU

YU =βαββ∂∂

↓β ))(ˆ,( lim 0

)2)1()(()0(2

)1()()1()()( 1

0

1

0

1

0∫ ∫ ∫+= /ds-WsW

fWdssS-WsdWsSY

27Heidelberg 9/05

Pile-up probabilities (cont)

Theorem 4. (exact Laplace likelihood)

The pile-up probability is always smaller for the exact MLE than for the joint MLE (see Theorem 3).

21)()(

21P)1~(P

1

0⎥⎦

⎤⎢⎣

⎡<<→=θ ∫ -sdWsdSYlm

Remark 1.

If Zt has a Laplace density then

where W(s) and V(s) are independent standard Brownian motions.

,21)( -|z|/σezfσ

=

[ ] .21)()()1(

1

0∫ +−= sdVsWsWY

28Heidelberg 9/05

Pile-up probabilities (cont)

It follows that

∫<<→=θ1

0

))()(0P()1ˆ(P sdWsdSYlm

[ ]∫ <+−<=1

0

)15.)()()1( P(0 sdVsWsW

[ ] ⎥⎦

⎤⎢⎣

⎡∈<−<= ∫

1

0

])1,0[ ),(|5.)()()1( P(-.5 ttWsdVsWsWE

[ ]⎥⎥

⎢⎢

⎟⎟

⎜⎜

⎭⎬⎫

⎩⎨⎧

−Φ=−

∫ 1-)()1( .522/11

0

2 dssWsWE

820.0≈

)2/1))()(/21P()1~(P1

0

−<<→=θ ∫ sdWsdSYlm

.0)2/11/21P(

=−<<= Y

⇒ no pile-up

30Heidelberg 9/05

Pile-up probabilities (cont)

Remark 2.

∫<<→=θ1

0

))()(0P()1ˆ(P sdWsdSYlm

.0/||E ), P(0 11

>σ=<<= tZccY

)2/1/21P()1~(P 1 −<<→=θ cYlm

0>

where

On the other hand

That is, there is a pile-up if and only if c1 > 1.

if and only if c1 > 1.

Remark 3. Pile-up probability tends to be larger if the density is more concentrated around 0.

31Heidelberg 9/05

.827

.846

.831

.817

.823

.796t(5)

.836

.841

.843

.864

.864

.831Unif

.820

.809

.819

.819

.806

.796Lap

.858

.855

.844

.873

.859

.827Gau

200

500

50

20

100

n

Simulation results – pile-up probabilities

Pile-up probabilities for local maximum: )1ˆ(P =θlm )1ˆ(P =θlm