laplace likelihood and lad estimation for non-invertible ma(1)rdavis/lectures/nber_05.pdf ·...
TRANSCRIPT
1Heidelberg 9/05
LaplaceLaplace Likelihood and LAD Estimation Likelihood and LAD Estimation for Nonfor Non--invertible MA(1)invertible MA(1)
Colorado State UniversityNational Tsing-Hua University
U. of California, San Diego
(http://www.stat.colostate.edu/~rdavis/lectures)
Richard A. Davis
F. Jay Breidt
Nan-Jung Hsu
Murray Rosenblatt
2Heidelberg 9/05
ProgramIntroduction
The MA(1) unit root problemWhy study non-invertible MA(1)’s?
over-differencingrandom walk + noise
Gaussian Likelihood EstimationIdentifiabilityLimit results Extensions
non-zero meanheavy tails
Laplace Likelihood/LAD estimationJoint and exact likelihoodLimit resultsLimit distribution/simulation comparisonsPile-up probabilities
joint likelihoodexact likelihood
3Heidelberg 9/05
MA(1) unit root problem
MA(1):
Yt = Zt − θ Zt-1 , {Zt} ~ IID (0, σ2)
Properties:
• |θ| < 1 ⇒ (invertible)
• |θ| > 1 ⇒ (non-invertible)
• |θ| = 1 ⇒ and
⇒ (perfect interpolation)
• |θ| < 1 ⇒
MLE = maximum (Gaussian) likelihood, n = sample size
What if θ =1?
∑∞
=−θ=
0
j
jjtt YZ
∑∞
=+θ−=
1
j-
jjtt YZ
},...,,{sp },...,,{sp 211 ++− ∈∈ tttttt YYZYYZ
00}0 ,sp{P YYsYs=≠
)/)1(,(AN is ˆ 2 nmle θ−θθ
4Heidelberg 9/05
Why study non-invertible MA(1)?
a) over-differencing
• linear trend model: Xt = a + bt + Zt .Yt = Xt − Xt-1 = b + Zt − Zt-1 ~ MA(1) with θ = 1.
• seasonal model: Xt = st + Zt , st seasonal component w/ period 12.Yt = Xt − Xt-12 = Zt − Zt-12 ~ MA(12) with θ = 1.
b) random walk + noise
Xt = Xt-1 + Ut (random walk signal)
Yt = Xt + Vt (random walk signal + noise)
ThenYt − Yt-1 = Ut + Vt − Vt-1 ~MA(1)
with θ=1 if and only if Var(Ut) = 0.
5Heidelberg 9/05
Identifiability and Gaussian likelihoodIdentifiability
• |θ| > 1 ⇒ Yt = εt – θ−1 εt-1 , where {εt} ~ WN(0,θ2σ2).
• {εt} is IID if and only if {Zt} is Gaussian (Breidt and Davis `91)
Gaussian Likelihood
LG(θ, σ2) = LG(1/θ, θ2σ2) ⇒ θ is only identifiably for |θ| ≤ 1.Notes:i) this implies LG(θ) = LG(1/θ) for the profile likelihood and θ = 1 is a
critical point,
ii) a pile-up effect ensues, i.e.,
even if θ < 1.
.0)1(L'G =
0)1ˆ( >=θP
6Heidelberg 9/05
Gaussian likelihood examples
theta
Gau
ssia
n lik
elih
ood
0.5 1.0 1.5 2.0
1.20
1.25
1.30
1.35
1.40
100 observations from Yt = Zt − θ0 Zt-1 , {Zt} ~ IID (0, σ2), Laplace pdf
θ0 =1.0θ0 =.8 θ0 =1.25
7Heidelberg 9/05
MLE (Gaussian likelihood)
Idea: build parameter normalization into the likelihood function.
Model: Yt = Zt - (1-β/n) Zt-1 , t =1,…,n.
β = n(1-θ), θ = 1- β/n, θ0 = 1- γ/n
Gaussian Likelihood:Ln(β) = ln (1- β/n) - ln (1), ln ( ) = profile log-like.
Theorem (Davis and Dunsmuir `96): Under θ0 = 1-γ / n,Ln(β) →d Zγ (β) on C[0,∞).
Results:
• argmax Ζγ(β)
• arglocalmax Ζγ(β)
• if γ = 0.
=β→θ− ˆ)ˆ1( lmlmn
6518.)0ˆP()1ˆP( ==β→=θ lmlm
=β→θ− mlemlen ˆ)ˆ1(
8Heidelberg 9/05
Extensions of MLE (Gaussian likelihood)
ii) heavy tails (Davis and Mikosch `98): {Zt} symmetric alpha stable
(SαS). Then the max Gaussian likelihood estimator has the same
normalizing rate, i.e.,
The pile-up decreases with increasing tail heaviness.
i) non-zero mean (Chen and Davis `00): same type of limit, except pile-up is more excessive.
This makes hypothesis testing easy!
Reject H0: θ = 1 if (size of test is .045)
lmdlmn β→θ− ˆ)ˆ1(
)0ˆP()1ˆP( =β→=θ lmlm
.955)1ˆ(P →=θmle
1ˆ <θmle
9Heidelberg 9/05
Comparison of limit cdf’s for different α’s
-20 -15 -10 -5 0
0.00.1
0.20.3
0.4
α = 2.0
α = 1.5
α = 1.0
α = .75
10Heidelberg 9/05
Laplace likelihood/LAD estimation If noise distribution is non-Gaussian, the MA(1) parameter θ is identifiable for all real values.
Q1. For MLE (non-Gaussian) does one have n or n1/2 asymptotics?
Q2. Is there a pile-up effect?
11Heidelberg 9/05
Laplace likelihood – joint and exactModel. Yt = Zt − θ Zt-1 , {Zt} ~ IID (0, σ2) with median 0 and EZ4 < ∞.Initial variable.
Joint density: Let Yn=(Y1, . . . ,Yn), then
where the zt are solved
⎩⎨⎧
−≤θ
= ∑ =
n
t tninit YZ
ZZ
1
0
otherwise. ,,1||if ,
( ),1||1),...,,(),( }1|{|}1|{|10 >θ−
≤θ θ+= nninitn zzzfzf y
forward by: zt = Yt + θ zt-1, t =1,…,n for |θ| ≤ 1 with z0 = zinit
backward by: zt-1 = θ−1(zt – Yt ), t =n,…,1 for |θ| > 1 with zn = zinit+ Y1+. . .+ Yn
Note: integrate out zinit to get exact likelihood.
initinitnn dzzff ∫∞
∞−
= ),()( yy
12Heidelberg 9/05
Laplace likelihood examples
100 observations from Yt = Zt − θ0 Zt-1 , {Zt} ~ IID (0, σ2), Laplace pdf
0.5
1
1.5
2
theta
-2
-1
0
1
2
z.init 0
0.2
0.4
0.6
0.8
1Z
0.5
1
1.5
2
theta
-2
-1
0
1
2
z.init
00.
20.
40.
60.
81
Z
θ0 =.8
θ0 =1.0
13Heidelberg 9/05
Laplace likelihood examples (cont)
100 observations from Yt = Zt − θ0 Zt-1 , {Zt} ~ IID (0, σ2), Laplace pdf
θ0 =1.0θ0 =.8 θ0 =1.25
Exact likelihood Joint likelihood at zmax(θ)
theta
Lapl
ace
likel
ihoo
d
0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
theta
Lapl
ace
likel
ihoo
d
0.5 1.0 1.5 2.0
0.0
0.02
0.04
0.06
0.08
0.10
15Heidelberg 9/05
Laplace likelihood-LAD estimation
(Joint) Laplace log-likelihood.
Maximizing wrt σ, we obtain
so that maximizing L is equivalent to minimizing
∑=
>θ− θ−σ−σ+−=σθ
n
ttinit nznzL
0}1|{|
1 1|)|(log||2log)1(),,(
∑=
+=σn
tt nz
0
)1/(||ˆ
⎪⎪⎩
⎪⎪⎨
⎧
θ
≤θ=θ
∑
∑
=
=n
tt
n
tt
initn
z
zzl
0
0
otherwise. |,|||
1, || if |,|),(
16Heidelberg 9/05
Joint Laplace likelihood — limit results
Theorem 1. Under the parameterizations, θ = 1 + β/n and zinit = Z0 + ασ/n1/2,
we have
on C(R2), where
for β ≤ 0, and
for β > 0, in which S(t) and W(t) are the limits of the partial sum processes
),()),1(),((),( 01 αβ→−θσ=αβ − UZlzlU dninitnn
dsetdSef
sdWetdSeU
ssts
ssts
n
21
0 0
)(
1
0 0
)(
)()0(
)()(),(
∫ ∫
∫ ∫
⎟⎟⎠
⎞⎜⎜⎝
⎛α+β+
⎟⎟⎠
⎞⎜⎜⎝
⎛α+β=αβ
β−β
−β−β
)()()0(
)()(),(
21
0
1)1()(
1
0
1)1()(
sdWetdSef
sdWetdSeU
s
sts
s
stsn
∫ ∫
∫ ∫
⎟⎟⎠
⎞⎜⎜⎝
⎛α+β+
⎟⎟⎠
⎞⎜⎜⎝
⎛α+β−=αβ
−β−−β
+
−β−−β
17Heidelberg 9/05
Joint Laplace likelihood — limit results
From the limit,
it follows that
where
.)(1)( ),(1)( d
][
0
][
0
W(t)Zsignn
tWtSZn
tSnt
iind
nt
iin →
σ=→
σ= ∑∑
==
),(),( αβ→αβ UU dn
)ˆ,ˆ())ˆ(),1ˆ(( 01
lmlmdLinitlm Zznn αβ→−σ−θ −
).,( minarg(local) )ˆ,ˆ( αβ=αβ Ulmlm
18Heidelberg 9/05
Exact Laplace likelihood — limit results
Exact Laplace Likelihood:
Theorem 2. For the MLE , we have
where
and U*(β) is a stochastic process defined in terms of S(t) and W(t).
In addition,
∫∞
∞−
=σθ initinitnn dzzfL ),(),( y
),,~(|))|~(),1~(( 0 NZEnn mledmlemle β→−σ−θ
~ ,~nn σθ
|)),var(|N(0,~N , )( min arg~0
* ZUmle β=β
).( min (local) arg~ ,~)1~( * β=ββ→−θ Un lmlmdlm
19Heidelberg 9/05
Simulating from the limit process
Step 1. Simulate two indep sequences (W1, . . . , Wm) and (V1, . . . , Vm) of iid N(0,1) random variables with m=100000.
.100000/)( and 100000/)(]100000[
1
]100000[
1∑∑
==
==t
jj
t
jj VtVWtW
./)(Var and /|| 21
221 cZcZEc tt −σ=σ=
Step 5. Determine the respective local and global minimizers of U(β,α) and U*(β) numerically.
Step 2. Form W(t) and V(t) by the partial sum processes,
Step 3. Set S(t) = c1W(t) + c2V(t), where
Limit process depends only on c1, c2, and f (0).
Step 4. Compute U(β,α) and U*(β) from the definition.
20Heidelberg 9/05
Limit process
2 realizations of the limit processes, U(β,α(β)), U*(β).
Laplace pdf t(5) pdf
beta
U a
nd U
star
-20 -10 0 10 20
02
46
810
12
beta
U a
nd U
star
-20 -10 0 10 20
-20
24
21Heidelberg 9/05
Limit distribution
beta.min
-20 -10 0 10 20 30
0.0
0.05
0.10
0.15
beta.min
-20 -10 0 10 20
0.0
0.05
0.10
0.15
0.20
0.25
0.30
Joint Lap Likelihood Exact Lap Likelihood
ˆ)1ˆ( lmdlmn β→−θ
red graph = Laplace pdf for Zt
blue graph = Gaussian pdf for Zt
ˆ)1ˆ( lmdlmn β→−θ ~)1~( lmdlmn β→−θ ~)1~( lmdlmn β→−θ
22Heidelberg 9/05
Limit cdf
beta.min
cdf
-20 -10 0 10 20
0.0
0.2
0.4
0.6
0.8
1.0
Joint Lap Like
red graph = Laplace pdf for Zt blue graph = Gaussian pdf for Zt
beta.min
cdf
-20 -10 0 10 200.
00.
20.
40.
60.
81.
0
Exact Lap Like
24Heidelberg 9/05
.0000 .0004 .0293
.0574 .0208 .1511
.0574 .0208 .1539
.0483 .0211 .1414
biass.d.rmseasymp
n = 50
-.0057 -.0033 -.0208.1438 .0656 .2430.1439 .0657 .2438.1207 .0526 .2236
biass.d.rmseasymp
n = 20
.0005 -.0003 -.0025
.0303 .0107 .1000
.0303 .0107 .1000
.0241 .0105 .1000
biass.d.rmseasymp
n = 100
.0005 .0000 -.0016
.0140 .0058 .0718
.0141 .0058 .0718
.0121 .0053 .0707
biass.d.rmseasymp
n = 200
Exact Jointn ˆ ˆ ~
σθθ lmlm
Laplace noise
θ = 1, σ =1
1000 reps
25Heidelberg 9/05
-.013 .002 -.026.096 .078 .171
biasrmse
n = 50
-.047 -.050 -.057.224 .213 .297
biasrmse
n = 20
.003 -.003 -.009
.051 .034 .105biasrmse
n = 100
.000 .000 .007
.028 .014 .070biasrmse
n = 200
Exact Joint Condn ˆ ~
mlmlml θθθLaplace noise
θ = 1, σ =1
1000 reps
Simulation results
Exact = MLE
Joint = maximize over θ and zinit
Cond = maximize over θ conditional on zinit= 0
Note:
• LM dominates ML
• joint dominates exact (rmse is half the size)
26Heidelberg 9/05
Pile-up probabilities
Theorem 3. (joint Laplace likelihood)
, where
Idea:
Now,
and the result follows.
))()( 0(P)1ˆ(P1
0∫<<→=θ sdWsdSYlm
)0))(ˆ,( lim and 0))(ˆ,( (limP)1ˆ(P 00 >βαββ∂∂
<βαββ∂∂
==θ ↓β↑β nnlm UU
)0))(ˆ,( lim and 0))(ˆ,( (limP 00 >βαββ∂∂
<βαββ∂∂
→ ↓β↑β UU
∫−=βαββ∂∂
↑β
1
00 )()())(ˆ,( lim sdWsdSYU
YU =βαββ∂∂
↓β ))(ˆ,( lim 0
)2)1()(()0(2
)1()()1()()( 1
0
1
0
1
0∫ ∫ ∫+= /ds-WsW
fWdssS-WsdWsSY
27Heidelberg 9/05
Pile-up probabilities (cont)
Theorem 4. (exact Laplace likelihood)
The pile-up probability is always smaller for the exact MLE than for the joint MLE (see Theorem 3).
21)()(
21P)1~(P
1
0⎥⎦
⎤⎢⎣
⎡<<→=θ ∫ -sdWsdSYlm
Remark 1.
If Zt has a Laplace density then
where W(s) and V(s) are independent standard Brownian motions.
,21)( -|z|/σezfσ
=
[ ] .21)()()1(
1
0∫ +−= sdVsWsWY
28Heidelberg 9/05
Pile-up probabilities (cont)
It follows that
∫<<→=θ1
0
))()(0P()1ˆ(P sdWsdSYlm
[ ]∫ <+−<=1
0
)15.)()()1( P(0 sdVsWsW
[ ] ⎥⎦
⎤⎢⎣
⎡∈<−<= ∫
1
0
])1,0[ ),(|5.)()()1( P(-.5 ttWsdVsWsWE
[ ]⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
⎭⎬⎫
⎩⎨⎧
−Φ=−
∫ 1-)()1( .522/11
0
2 dssWsWE
820.0≈
)2/1))()(/21P()1~(P1
0
−<<→=θ ∫ sdWsdSYlm
.0)2/11/21P(
=−<<= Y
⇒ no pile-up
30Heidelberg 9/05
Pile-up probabilities (cont)
Remark 2.
∫<<→=θ1
0
))()(0P()1ˆ(P sdWsdSYlm
.0/||E ), P(0 11
>σ=<<= tZccY
)2/1/21P()1~(P 1 −<<→=θ cYlm
0>
where
On the other hand
That is, there is a pile-up if and only if c1 > 1.
if and only if c1 > 1.
Remark 3. Pile-up probability tends to be larger if the density is more concentrated around 0.