sparse var-models · diebold and yilmaz (2015) christophe croux sparse var-models 19 / 32. granger...
TRANSCRIPT
Frontiers in Forecasting, MinneapolisFebruary 21-23, 2018
Sparse VAR-Models
Christophe Croux
EDHEC Business School (France)
Joint Work with Ines Wilms (Cornell University), Luca Barbaglia (KU leuven),
and Sarah Gelper (TU Eindhoven).
Christophe Croux Sparse VAR-Models 1 / 32
10-dimensional time series
−3
−1
1
CR
UO
−1.
50.
0
GA
SO
−4
−1
1
NAT
G−
3−
11
ET
HA
2013 2014 2015 2016
Time
−6
−3
0
BIO
D
−5
−2
0
CO
RN
−4
−2
0
WH
EA
−1.
50.
0
SU
GA
−4
−2
0
SO
YO
2013 2014 2015 2016
Time
−3
−1
1
CO
FF
Log−volatilies
Christophe Croux Sparse VAR-Models 3 / 32
Vector Autoregressive Model (VAR)
Two stationary time series y1,t and y2,t .
VAR(1) in dimension q = 2:y1,t = Γ1,11 y1,t−1 + Γ1,12y2,t−1 + e1t
y2,t = Γ1,21 y1,t−1 + Γ1,22y2,t−1 + e2t
Covariance matrix of (e1t , e2t)′ is Σ.
Vector notation: yt = Γ1yt−1 + et ,
Christophe Croux Sparse VAR-Models 5 / 32
The VAR model
Let yt be a q-dimensional stationary time series
Vector Autoregressive Model of order p:
yt = Γ1yt−1 + Γ2yt−2 + . . . + Γpyt−p + et ,
Matrices Γj are autoregressive parameters
et error with covariance matrix Σ = Ω−1.
Standard estimation procedure: OLS equation by equation.
Christophe Croux Sparse VAR-Models 6 / 32
Log-Volatilies for commodities (weekly data)
VAR model for q = 10 time series (and T = 156 observations)
One lag
– 1× (q × q) = 100 regression parameters– 45 unique elements in Σ
Two lags
– 2× (q × q) = 200 regression parameters– 45 unique elements in Σ
→ Explosion of number of parameters
Christophe Croux Sparse VAR-Models 8 / 32
The VAR model: Overparametrization
ML estimators will be
Not computable
Inaccurate
Sparse estimation ≡ many estimated parameters equal to zero
Suitable if T is small relative to the number of parameters
Easier to interpret
Automatic variable selection
Better estimation and prediction performance
Christophe Croux Sparse VAR-Models 9 / 32
Sparse Estimation: Lasso
In the multiple linear regression model
y = β1x1 + β2x2 + . . . + βkxk + ε
Minimization problem
β = argminβ
(y − Xβ)′(y − Xβ) + λk∑
l=1
|βl |.
Tibshirani (1996)
Christophe Croux Sparse VAR-Models 10 / 32
Lasso for the VAR model
Multiple equations
– Partial correlation structure of the error term→ Glasso of Friedman et al. (2008)
Dynamic nature of the model
– Selecting a time series into one of the equations = selecting thevariable and all its lags→ Group lasso (Yuan and Lin, 2006)
Christophe Croux Sparse VAR-Models 11 / 32
Penalized ML estimation
Rewrite the VAR in matrix notation:
Y = YLΓ + E,
where
Y = (yp+1, . . . , yT )′
YL = (Xp+1, . . . ,XT )′ with Xt = (y′t−1, . . . , y′t−p)′
Γ = (Γ1, . . . ,Γp)′
E = (ep+1, . . . , eT )′.
Christophe Croux Sparse VAR-Models 12 / 32
Penalized ML estimation (cont.)
Penalized negative log likelihood:
(Γ, Ω) = argminΓ,Ω
1
Ttr(
(Y − YLΓ)Ω(Y − YLΓ)′)− log|Ω|
+λ1
G∑g=1
||γg ||2 + λ2
∑k 6=k ′
|Ωkk ′ |,
with
γg is subvector of Γ
G = q2 total number of groups.
Christophe Croux Sparse VAR-Models 13 / 32
Algorithm
Solving for Γ|Ω:
Γ|Ω = argminΓ
1
Ttr(
(Y − YLΓ)Ω(Y − YLΓ)′)
+ λ1
G∑g=1
||γg ||2.
→ groupwise lasso
Christophe Croux Sparse VAR-Models 14 / 32
Algorithm (cont.)
Solving for Ω|Γ:
Ω|Γ = argminΩ
1
Ttr(
(Y−YLΓ)Ω(Y−YLΓ)′)−log|Ω|+λ2
∑k 6=k ′
|Ωkk ′|.
→ penalized inverse covariance estimation (glasso)
Christophe Croux Sparse VAR-Models 15 / 32
Selection of tuning parameters
In the iteration step Γ|Ω, select λ1 to minimize
BICλ1 = −2 log Lλ1 + kλ1 log(T ),
Lλ1 is the estimated likelihood using λ1
kλ1 is the number of non-zero estimated regression coefficients.
In the iteration step Ω|Γ, select λ2 analogously.
Christophe Croux Sparse VAR-Models 16 / 32
Commodity prices: log volatilities (weekly data)
−3
−1
1
CR
UO
−1.
50.
0
GA
SO
−4
−1
1
NAT
G−
3−
11
ET
HA
2013 2014 2015 2016
Time
−6
−3
0
BIO
D
−5
−2
0
CO
RN
−4
−2
0
WH
EA
−1.
50.
0
SU
GA
−4
−2
0
SO
YO
2013 2014 2015 2016
Time
−3
−1
1
CO
FF
Log−volatilies
Christophe Croux Sparse VAR-Models 17 / 32
Networks from the VAR coefficients Γ.
Network with q nodes. Each node corresponds with a time series.
draw an edge from node i to node j if
P∑p=1
|Γp,ji | 6= 0
Additionally (if p = 1)
the edge width is the size of the effect
the edge color is the sign of the effect(blue if positive, red if negative)
Christophe Croux Sparse VAR-Models 18 / 32
Network on the AutoRegressive coefficients
CRUO
GASO
NATGETHA
BIOD
CORN
WHEA
SUGASOYO
COFF
Diebold and Yilmaz (2015)
Christophe Croux Sparse VAR-Models 19 / 32
Granger Causality
Time series i is Granger Causing time series j
l
Time series i it has incremental predictive power in forecasting series j
l
In the network there is an arrow going from node i to node j
Granger Causality test in high dimensions: Wilms, Gelper, Croux, 2016
Christophe Croux Sparse VAR-Models 20 / 32
Network on the precision matrix
CRUO
GASO
NATGETHA
BIOD
CORN
WHEA
SUGASOYO
COFF
Christophe Croux Sparse VAR-Models 21 / 32
References
Rothman, Levina, and Zhu (2010), “Sparse multivariate regression with covarianceestimation,” Journal of Computational and Graphical Statistics.
Hsu, Hung, and Chang (2008), “Subset selection for vector autoregressive processesusing lasso,” Computational Statistics and Data Analysis.
Davis, Zang, and Zheng (2016), “Sparse vector autoregressive modeling,” Journal ofComputational and Graphical Statistics.
Basu and Michailidis (2015), “Regularized estimation in sparse high-dimensional timeseries models,” Annals of Statistics.
Nicholson, Matteson, and Bien, J. (2017), VARX-L: Structured regularization for largevector autoregressions with exogenous variables, International Journal of Forecasting.
Wilms, Basu, Bien, and Matteson (2017), bigtime: Sparse Estimation of Large TimeSeries Models, R package.
...
Gelper S., Wilms I. and Croux C. (2016), “Identifying demand effects in a large networkof product categories,” Journal of Retailing.
Christophe Croux Sparse VAR-Models 22 / 32
Data
Sales, promotion and prices for 17 product categories.
T = 77 weekly observations
VAR model for q = 3× 17 = 51 time series
0 20 40 60
−6
−4
−2
02
46
Time
Christophe Croux Sparse VAR-Models 24 / 32
Sales Forecasting
One-step ahead, using rolling window.Averaged over the 17 product categories, and the 15 stores.
Method Mean Absolute Forecast Error
Sparse 736Bayesian: Minnesota 875Bayesian: Normal-Inverse Wishart 1078
Least Squares 1298Restricted LS 784
Sparse method significantly better (P < 0.01, Diebold-Mariano test)
Christophe Croux Sparse VAR-Models 26 / 32
Simulation Study
Bayesian methods
– Minnesota prior (Koop and Korobilis, 2009)– Normal-Inverse Wishart prior (Banbura et al, 2010)
Simulation Design: Sparse high-dimensional : q = 10, p = 2,T = 50
Christophe Croux Sparse VAR-Models 27 / 32
Method Mean Absolute Estimation Error
Sparse 0.041Bayesian: Minnesota 0.044Bayesian: Normal-Inverse Wishart 0.077
Least Squares 0.157Restricted LS 0.121
Christophe Croux Sparse VAR-Models 28 / 32
Multi-Class VAR
We have a chain of 17 stores.
Estimate a VAR for each store separately.
Pool the data for the 17 stores in one single class , and estimateone single VAR (or panel-VAR.)
Multi-class estimation, where the data decide what the classesare.
Danaher, Wang, and Witten, D. (2014): joint graphical lasso.Tibshirani, Saunders, Rosset, Zhu, and Knight (2005): Fused Lasso
Christophe Croux Sparse VAR-Models 30 / 32
Multi-class sparse VAR: K classes
Γ(k), k = 1, . . . ,K are minimizing
K∑k=1
T∑t=1
(y(k)t − Γ(k)YL
(k))′Ω(k)
(y(k)t − Γ(k)YL
(k))− TJ log |Ω(k)|
+K∑
k=1
J∑i ,j=1
P∑p=1
|Γ(k)p,ij |+
K∑k 6=k ′
J∑i ,j=1
P∑p=1
|Γ(k)p,ij − Γ
(k ′)p,ij |+ ...
Wilms, I.; Barbaglia, L.; Croux, C. (2018), JRSSS-C
Christophe Croux Sparse VAR-Models 31 / 32