sparse var-models · diebold and yilmaz (2015) christophe croux sparse var-models 19 / 32. granger...

Frontiers in Forecasting, MinneapolisFebruary 21-23, 2018

Sparse VAR-Models

Christophe Croux

EDHEC Business School (France)

Joint Work with Ines Wilms (Cornell University), Luca Barbaglia (KU leuven),

and Sarah Gelper (TU Eindhoven).

Christophe Croux Sparse VAR-Models 1 / 32

Section 1

VAR Models


10-dimensional time series

−3

−1

1

CR

UO

−1.

50.

0

GA

SO

−4

−1

1

NAT

G−

3−

11

ET

HA

2013 2014 2015 2016

Time

−6

−3

0

BIO

D

−5

−2

0

CO

RN

−4

−2

0

WH

EA

−1.

50.

0

SU

GA

−4

−2

0

SO

YO

2013 2014 2015 2016

Time

−3

−1

1

CO

FF

Log−volatilies


Network

CRUO

GASO

NATGETHA

BIOD

CORN

WHEA

SUGASOYO

COFF


Vector Autoregressive Model (VAR)

Two stationary time series y1,t and y2,t .

VAR(1) in dimension q = 2:y1,t = Γ1,11 y1,t−1 + Γ1,12y2,t−1 + e1t

y2,t = Γ1,21 y1,t−1 + Γ1,22y2,t−1 + e2t

Covariance matrix of (e1t , e2t)′ is Σ.

Vector notation: yt = Γ1yt−1 + et ,


The VAR model

Let yt be a q-dimensional stationary time series

Vector Autoregressive Model of order p:

yt = Γ1yt−1 + Γ2yt−2 + . . . + Γpyt−p + et ,

Matrices Γj are autoregressive parameters

et error with covariance matrix Σ = Ω−1.

Standard estimation procedure: OLS equation by equation.


Section 2

Sparse VAR


Log-Volatilies for commodities (weekly data)

VAR model for q = 10 time series (and T = 156 observations)

One lag

– 1× (q × q) = 100 regression parameters– 45 unique elements in Σ

Two lags

– 2× (q × q) = 200 regression parameters– 45 unique elements in Σ

→ Explosion of number of parameters


The VAR model: Overparametrization

ML estimators will be

Not computable

Inaccurate

Sparse estimation ≡ many estimated parameters equal to zero

Suitable if T is small relative to the number of parameters

Easier to interpret

Automatic variable selection

Better estimation and prediction performance


Sparse Estimation: Lasso

In the multiple linear regression model

y = β1x1 + β2x2 + . . . + βkxk + ε

Minimization problem

β = argminβ

(y − Xβ)′(y − Xβ) + λk∑

l=1

|βl |.

Tibshirani (1996)


Lasso for the VAR model

Multiple equations

– Partial correlation structure of the error term→ Glasso of Friedman et al. (2008)

Dynamic nature of the model

– Selecting a time series into one of the equations = selecting thevariable and all its lags→ Group lasso (Yuan and Lin, 2006)


Penalized ML estimation

Rewrite the VAR in matrix notation:

Y = YLΓ + E,

where

Y = (yp+1, . . . , yT )′

YL = (Xp+1, . . . ,XT )′ with Xt = (y′t−1, . . . , y′t−p)′

Γ = (Γ1, . . . ,Γp)′

E = (ep+1, . . . , eT )′.


Penalized ML estimation (cont.)

Penalized negative log likelihood:

(Γ, Ω) = argminΓ,Ω

1

Ttr(

(Y − YLΓ)Ω(Y − YLΓ)′)− log|Ω|

+λ1

G∑g=1

||γg ||2 + λ2

∑k 6=k ′

|Ωkk ′ |,

with

γg is subvector of Γ

G = q2 total number of groups.


Algorithm

Solving for Γ|Ω:

Γ|Ω = argminΓ

1

Ttr(

(Y − YLΓ)Ω(Y − YLΓ)′)

+ λ1

G∑g=1

||γg ||2.

→ groupwise lasso


Algorithm (cont.)

Solving for Ω|Γ:

Ω|Γ = argminΩ

1

Ttr(

(Y−YLΓ)Ω(Y−YLΓ)′)−log|Ω|+λ2

∑k 6=k ′

|Ωkk ′|.

→ penalized inverse covariance estimation (glasso)


Selection of tuning parameters

In the iteration step Γ|Ω, select λ1 to minimize

BICλ1 = −2 log Lλ1 + kλ1 log(T ),

Lλ1 is the estimated likelihood using λ1

kλ1 is the number of non-zero estimated regression coefficients.

In the iteration step Ω|Γ, select λ2 analogously.


Commodity prices: log volatilities (weekly data)

−3

−1

1

CR

UO

−1.

50.

0

GA

SO

−4

−1

1

NAT

G−

3−

11

ET

HA

2013 2014 2015 2016

Time

−6

−3

0

BIO

D

−5

−2

0

CO

RN

−4

−2

0

WH

EA

−1.

50.

0

SU

GA

−4

−2

0

SO

YO

2013 2014 2015 2016

Time

−3

−1

1

CO

FF

Log−volatilies


Networks from the VAR coefficients Γ.

Network with q nodes. Each node corresponds with a time series.

draw an edge from node i to node j if

P∑p=1

|Γp,ji | 6= 0

Additionally (if p = 1)

the edge width is the size of the effect

the edge color is the sign of the effect(blue if positive, red if negative)


Network on the AutoRegressive coefficients

CRUO

GASO

NATGETHA

BIOD

CORN

WHEA

SUGASOYO

COFF

Diebold and Yilmaz (2015)


Granger Causality

Time series i is Granger Causing time series j

l

Time series i it has incremental predictive power in forecasting series j

l

In the network there is an arrow going from node i to node j

Granger Causality test in high dimensions: Wilms, Gelper, Croux, 2016


Network on the precision matrix

CRUO

GASO

NATGETHA

BIOD

CORN

WHEA

SUGASOYO

COFF


References

Rothman, Levina, and Zhu (2010), “Sparse multivariate regression with covarianceestimation,” Journal of Computational and Graphical Statistics.

Hsu, Hung, and Chang (2008), “Subset selection for vector autoregressive processesusing lasso,” Computational Statistics and Data Analysis.

Davis, Zang, and Zheng (2016), “Sparse vector autoregressive modeling,” Journal ofComputational and Graphical Statistics.

Basu and Michailidis (2015), “Regularized estimation in sparse high-dimensional timeseries models,” Annals of Statistics.

Nicholson, Matteson, and Bien, J. (2017), VARX-L: Structured regularization for largevector autoregressions with exogenous variables, International Journal of Forecasting.

Wilms, Basu, Bien, and Matteson (2017), bigtime: Sparse Estimation of Large TimeSeries Models, R package.

...

Gelper S., Wilms I. and Croux C. (2016), “Identifying demand effects in a large networkof product categories,” Journal of Retailing.


Section 3

Application: a Market Response Model


Data

Sales, promotion and prices for 17 product categories.

T = 77 weekly observations

VAR model for q = 3× 17 = 51 time series

0 20 40 60

−6

−4

−2

02

46

Time


Sales Forecasting

One-step ahead, using rolling window.Averaged over the 17 product categories, and the 15 stores.

Method Mean Absolute Forecast Error

Sparse 736Bayesian: Minnesota 875Bayesian: Normal-Inverse Wishart 1078

Least Squares 1298Restricted LS 784

Sparse method significantly better (P < 0.01, Diebold-Mariano test)


Simulation Study

Bayesian methods

– Minnesota prior (Koop and Korobilis, 2009)– Normal-Inverse Wishart prior (Banbura et al, 2010)

Simulation Design: Sparse high-dimensional : q = 10, p = 2,T = 50


Method Mean Absolute Estimation Error

Sparse 0.041Bayesian: Minnesota 0.044Bayesian: Normal-Inverse Wishart 0.077

Least Squares 0.157Restricted LS 0.121


Section 4

Extensions


Multi-Class VAR

We have a chain of 17 stores.

Estimate a VAR for each store separately.

Pool the data for the 17 stores in one single class , and estimateone single VAR (or panel-VAR.)

Multi-class estimation, where the data decide what the classesare.

Danaher, Wang, and Witten, D. (2014): joint graphical lasso.Tibshirani, Saunders, Rosset, Zhu, and Knight (2005): Fused Lasso


Multi-class sparse VAR: K classes

Γ(k), k = 1, . . . ,K are minimizing

K∑k=1

T∑t=1

(y(k)t − Γ(k)YL

(k))′Ω(k)

(y(k)t − Γ(k)YL

(k))− TJ log |Ω(k)|

+K∑

k=1

J∑i ,j=1

P∑p=1

|Γ(k)p,ij |+

K∑k 6=k ′

J∑i ,j=1

P∑p=1

|Γ(k)p,ij − Γ

(k ′)p,ij |+ ...

Wilms, I.; Barbaglia, L.; Croux, C. (2018), JRSSS-C


Furtermore

- t-Lasso for estimating VAR

- cointegration


sparse var-models · diebold and yilmaz (2015) christophe croux sparse var-models 19 / 32. granger...

Documents