rank regularized estimation of approximate factor …sn2294/papers/baing17_slides.pdfapproximate...

48
Rank Regularized Estimation of Approximate Factor Models Jushan Bai Serena Ng Columbia University April 2018

Upload: others

Post on 26-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Rank Regularized Estimation of Approximate

Factor Models

Jushan Bai Serena Ng

Columbia University

April 2018

Page 2: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Outline

1 Approximate Factor ModelsAPC vs PC

2 Rank Minimization: NP Hard

3 Approximate-Rank MinimizationRPC vs PC

4 Rank Regularized Factor ModelsNumber of FactorsLinear Restrictions

Page 3: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Overview: Model: X = FΛ′ + e

APC: asymptotic principal components F =√TUr

eigenvectors can be constructed by iterative OLS.

What if we do iterative ridge regressions instead of OLS?

Singular value thresholding ⇒ robust PCRegularize rank of common componentAlgorithmic view: finite sample error bounds.

This paper: rank regularized factor analysis

Parametric analysis, asymptotic results for inference.A new, conservative factor selection rule.(*) Factor analysis under general linear restrictions.

Page 4: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Notation

Xit ∼ (0, 1), i = 1, . . .N , t = 1, . . .T .

svd: X = UDV ′ = Ur DrVr′ + Un−r Dn−rVn−r

Normalized Data: Z = X√NT

= UDV ′, D = D√NT

Unscaled model: X = F 0Λ0′ + e.

Scaled model: Z = F ∗Λ∗′ + e∗

F ∗ = F 0√T

, Λ∗ = Λ0√N

.

Page 5: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Asymptotic Principal Components: (APC)

minF ,Λ1

NT‖X − FΛ′‖2

F assuming strong factor structure

ΣF > 0,ΣΛ > 0; e weakly correlated.

(F , Λ) = (√TUr ,VrDr ), F ′F

T= Ir ,

Λ′ΛN

= D2r .

(Bai 2003): Under normalization F ′FT

= Ir or Λ′ΛN

= Ir ,

√N(Ft − H ′NTF

0t )

d−→ N (0,Avar(Ft))√T (Λi − GNTΛ0

i )d−→ N (0,Avar(Λt)).

with G = H−1NT .

Page 6: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Lemma

Rotation matrix: HNT =

(Λ0′Λ0

N

)(F

0′F

T

)D−2

r .

Let H1,NT = (Λ0′Λ0)(Λ′Λ0)−1;

HNT = H1,NT + op(1).

Let H2,NT = (F 0′F 0)−1(F 0′F );

HNT = H2,NT + op(1).

Results of independent interest:

Page 7: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Principal Components (PC)

Recall APC: (F , Λ) = (√TUr ,VrDr ), F ′F

T= Ir ,

Λ′ΛN

= D2r .

Many definitions of PC: e.g. (F , Λ) = (√TUrDr ,Vr ).

This paper defines PC:(F , Λ) = (

√TUrD

1/2r ,√NVrD

1/2r ) = (

√TFz ,

√NΛz).

Normalization: F ′FT

= Dr ,Λ′ΛN

= Dr .

Why? F ′FT

= Ir . Not convenient to put restrictions.

Page 8: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Principal Components (PC)

Recall APC: (F , Λ) = (√TUr ,VrDr ), F ′F

T= Ir ,

Λ′ΛN

= D2r .

Many definitions of PC: e.g. (F , Λ) = (√TUrDr ,Vr ).

This paper defines PC:(F , Λ) = (

√TUrD

1/2r ,√NVrD

1/2r ) = (

√TFz ,

√NΛz).

Normalization: F ′FT

= Dr ,Λ′ΛN

= Dr .

Why? F ′FT

= Ir . Not convenient to put restrictions.

Page 9: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Relation with APC: F = FD1/2r Λ = ΛD

−1/2r .

Define HNT = HNTD1/2r . From identities:

√N(Ft − H ′NTF

0t ) =

√ND1/2

r (Ft − HNT′F 0

t ),√T (Λi − H−1

NTΛ0i ) =

√TD−1/2

r (Λi − H−1NTΛ0

i ).

Asymptotic properties:

(i)√N(Ft − H ′NTF

0t )

d−→N

(0, D1/2

r Avar(Ft)D1/2r

);

(ii)√T (Λi − GNTΛ0

i )d−→N

(0, D−1/2

r Avar(Λi )D−1/2r

).

with GNT = H−1NT .

Page 10: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

1 Approximate Factor ModelsAPC vs PC

2 Rank Minimization: NP Hard

3 Approximate-Rank MinimizationRPC vs PC

4 Rank Regularized Factor ModelsNumber of FactorsLinear Restrictions

Page 11: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Let A be a n × n matrix with eigenvalues in D = diag(d):

Trace norm:∑n

k=1 Akk =∑n

k=1 dk

Nuclear norm: ||A||∗ =∑n

k=1 dk

Frobenius norm: ||A||2F =∑

ij A2ij = trace(A′A).

`1 norm: ||A||1 =∑

ij |Aij |Spectral norm: ||A||2 = maxk |dk |.

Page 12: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Spark vs Rank

For A ∈ Rm×n, n < m

spark(A) = minx 6=0||x ||0 s.t. Ax = 0

rank(A) = ||D||0 = nnz(D).

spark(A) = size of smallest set of lin. dep. columns.

rank(A)= size of largest set of lin. indep. columns.

spark(A) = n + 1⇔ rank(A) = n.

If spark(A) 6= n + 1:

spark(A) ≤ rank(A).spark(A) ≥ 1 + 1

µ(A) , µ(A) = maxm 6=n(am, an)|.

Computing spark(A) is NP-hard: Tillmann/Pfetsch IEEE-14

Page 13: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

NP Hard

NP problems: decision problems in which the answer”yes” can be efficiently verified using deterministiccomputations performed in polynomial time.

An NP hard problem is one that admits no generalcomputational solution that is significantly faster than abrute force search.

Page 14: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

1. Minimum Rank Factor Analysis

Early factor analysis: decompose ΣX = ΣC + Σe s.t.

i commonality matrix ΣC has smallest rankii ΣC : a non-negative definite,iii Σe : diagonal positive definite matrix (Haywood cases).

Rank minimization is NP hard (non-convexity),

Evidence in 1950s suggest many non-zero eigenvalues.Questioned usefulness of the concept of minimum rank.

Page 15: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

1980s: Decompose ΣX by solving surrogate problems s.t.

(i) ΣX − Σe ≥ 0 (ii) Σe ≥ 0.

(i) CMTFA: min trace(ΣX − Σe)=∑N

i=1 DCii

(ii) MARFA: C = C ∗ + C−.

C ∗ is best minimum rank approximation of C .rank(C ∗) = r , min

∑Ni=r+1 D

Cii

Page 16: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Approximate Minimum Rank: ten Berge-Kiers (1991)

minr

N∑i=r+1

DCii ≤ δ, s.t. (i)+(ii). (∗).

δ: tolerance for max unexplained common variance.

The approximate minimum rank of ΣC is the smallest rthat solves (*) for some δ ≥ 0.

Minimum rank: special case of δ = 0.

Sum of eigenvalues is convex.

Page 17: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

2. Matrix Completion

Complete the matrix Z with missing values.

Ω = index set of positions of observed data.

Underdetermined without some structure.

Assume the latent matrix L is low rank.

Netflix challenge: L = AB ′, A=movie genres, B=taste

Hard problem:

min rank(L) with Lij = Zij (i , j) ∈ Ω.

Surrogate problem:

min ‖L‖∗ with Lij = Zij , (i , j) ∈ Ω.

Z can be recovered if (i) there are not too many missingvalues, and (ii) they are missing at random.

Page 18: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

3. Low Rank Decomposition

Eckart-Young: Best rank r approximation of Z : UrDrV′r .

svd is sensitive to noise corruption.

Z = L︸︷︷︸low rank

+ S︸︷︷︸sparse, big noise

Compressed sensing: solve underdetermined systems,recover sparse signalsComputer vision: S=background noise.

Hard problem:

minL,S

rank(L) + γ ||S ||0.︸ ︷︷ ︸sparsity constraint

Objective function and constraint both non-convex.

Page 19: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Candes et al (2009)

Surrogate problem is convex:

minL,S||L||∗ + γ||S ||1,

L, S can be recovered with high probability under

incoherence conditions: L not sparse, S not low rank,

General problem

Z = L︸︷︷︸low rank

+ S︸︷︷︸sparse, big noise

+ W︸︷︷︸small noise

minL,S||L||∗ + γ||S ||1, ||W ||F ≤ δ.

Page 20: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Overview: Good to Relax

Hard problems: rank function.

Surrogate problems: nuclear norm

Cai et al (2008, Theorem 1):

UrDγr V′r = argminLγ ‖L‖∗ +

1

2‖Z − L‖2

F .

SVT=Singular-value thresholding operator:

Dγr =

(D11 − γ)+

. . .(Drr − γ)+)

SVT is the proximal operator of the nuclear norm:

Optimal low rank approx. under rank constraint: UrDγr Vr .

Page 21: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Relation to Factor Models

We have low rank solution

UrDγr V′r = min

Lγ ‖L‖∗ +

1

2‖Z − L‖2

F . (1).

L (of rank r) can be factorized: L = AB ′,

minA,B

γ ‖AB ′‖∗ +1

2‖Z − AB ′‖2

F (2)

Theorem: (A,B) solves (2) iff L = A B ′ solves (1).

Solution: Robust Principal Components (RPCA)

A = Ur (Dγr )1/2 B = Vr (D

γr )1/2.

Page 22: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Sketch of idea, γ = 0: AB ′ = UrDrVTr :

trace(Dr ) = trace(Ur′AB ′Vr ) ≤ ‖A‖F ‖B‖F ≤

1

2

(‖A‖2

F + ‖B‖2F

).

L = AB ′, ‖L‖∗ = trace(Dr ) ≤ 12(‖A‖2

F + ‖B‖2F ).

Put, A = UD1/2r ,B = D

1/2r V ,

1

2(‖A‖2

F + ‖B‖2F ) =

1

2(||D1/2

r ||2F + ||D1/2r ||2F ) = ‖Dr‖1 .

Bound holds with equality: ‖Dr‖1 = 12(‖A‖2

F + ‖B‖2F ).

Page 23: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

FOC View

FOC: (i) −(Z − AB′)B + γA = 0, (ii) −(Z − AB

′)′A + γB = 0.

Left multiplying (i) by A′ and (ii) by B ′: A′A = B ′B. Rearrange(−γI ZZ ′ −γI

)(AB

)=

(AB

)A′A.

This has the generic structure ZV = VX.

Eigenvalues of X are those of Z, V are corresponding eigenvectors.

A = Ur (Dγr )1/2, B = Vr (D

γr )1/2.

A particular normalization.

Page 24: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Factor Analysis and RPC

With A′A = B

′B = D

γ

r .

RPC of Z : (A,B) = (Ur (Dγr )1/2,Vr (D

γr )1/2)

RPC of X : (F ,Λ) = (√TUr (D

γr )1/2,

√NVr (D

γr )1/2).

PC of X : (F , Λ) = (√TUr (Dr )

1/2,√NVr (Dr )

1/2).

Relation between RPC and PC:

F = F

(Dγ

r D−1r

)1/2

Λ = Λ

(Dγ

r D−1r

)1/2

.

Even big factors will be shrunk

Small factors can be killed since rank(Dγr ) ≤ r

Sparse large noise not treated as factors

Smaller common component: var(C ) ≤ var(C ).

Page 25: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Effects of Regularization: ∆2NT = Dγ

r D−1r .

HNT = HNT∆NT .

F t − H′NTF

0t = ∆NT (Ft − H ′NTF

0t )

Λi − GNTΛ0i = ∆NT (Λi − H−1

NTΛ0i )

Proposition

(i)√N(F t − H

′NTF

0t )

d−→N

(0,∆∞Avar(F )∆∞

);

(ii)√T (Λi − GNTΛ0

i )d−→N

(0,∆∞Avar(Λ)∆∞

).

Unlike APC and PC, GNT = ∆NT H−1NT 6= H−1

NT .

Page 26: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Bias/Variance Tradeoff

diag(∆∞) = δ, δi < 1. Proposition implies

Avar(F ) ≤ Avar(F ), and Avar(Λ) ≤ Avar(Λ).

Regularization bias since C = UrDrVr′ 6= C = UrD

γr Vr

′.

Case r = 1: δ1 = (D11−γ)+

D11, C it = δ1Cit .

Abias(C it) = (δ1 − 1)C 0it

Avar(C it) = δ21Amse(Cit)

Amse(C it) = (δ1 − 1)2(C 0it)

2 + δ21Amse(Cit).

Relative MSE < 1 when Amse(Cit) large:

Amse(C it)

Amse(Cit)= (δ1 − 1)2 (C 0

it)2

Amse(Cit)+ δ2

1.

Page 27: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Asymptotic vs. Finite Sample Results

Z = L + S consistent with many probabilistic structure

Econometric theory: X = F 0Λ0′ + e, Z = X√NT

Strong factor structure ΣF > 0,ΣΛ > 0

r population eigenvalues diverge with N

Estimation: choose F ,Λ with e residually determined.

min(√N ,√T )(Cit − C 0

it)d−→N(0,Avar(Cit)).

Page 28: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Machine Learning Results:

Solve problem given data (finite sample).

Choose L and S simultaneously.

Netflix/noiseless problems: no reference to eigenvalues.

Incoherence condition: L is not sparse. Details

S is selected uniformly at random and not low rank.

For γ = 1√max(m,n)

, (L, S) = (L, S) with prob. 1− c0

n10 if

||S ||0 < c1mn, and rank (L) ≤ c1min(m,n)

µ log(max(m, n))−2.

Page 29: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Agarwal. Negahban and Wainwright (2012, Annals of Statistics)

M estimation based on regularized nuclear norm. Assumerestricted strong convexity of loss function.

With noisy data, cannot exactly recovery L.

What matters are eigenvectors of largest singular values.

err2 = ||L− L||2F + ||S − S ||2F

if ||L||∞ < c√m n

, with high probability, err2 ≤ c

(N+TNT

).

Page 30: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

||L||∞ = maxit |Lit |, ||L||2F =∑r

i=1 d2i .

||L||∞ < cmn

is a constraint on sum of eigenvalues.

N+TNT≈ min(N ,T )−1.

Econometric theory: min(√N ,√T )(Cit − Cit) = Op(1).

Different objective, results broadly agree.

Also related: Bertisimas, Copenhaver, Mazumder (2016),Lettau and Pelger (2017).

Page 31: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Number of Factors: min rank+model complexity

BaiNg-02 r = mink

log(ssrk) + kg(N,T ), ssrk =∥∥∥Z − Fk Λ′k

∥∥∥2

F

BaiNg-17 r = mink

log(ssrk) + kg(N,T ), ssrk =∥∥∥Z − F kΛ

′k

∥∥∥2

F

ssrk = 1−k∑

j=1

d2j , ssrk = 1−

k∑j=1

(dj − γ)2+

ICk ≈ ICk + γ

∑kj=1(2dj − γ)

ssrk.

A data dependent, heavier penalty.r ≥ r ∗: sparse outliers or weak factors.||Z ||F = 1. γ = .05 reduces contribution of factor i by(di − .05)2. Effect on small factors proportionally larger.

Page 32: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Implications for Factor Augmented Regressions

yt+h = α′Ft + β′Wt + εt+h.

Replace F by F , F , or F will give identical fit! They areall spanned by Ur , hence perfectly correlated.

The estimates of α will simply adjust for scale difference.

For F to have effect, do ridge regressions. Given κ,

αols = (F ′F )−1Fy = (Dγr )−1/2U ′y/

√T

αR = (F ′F + κIr )−1Fy

= (Dγr + κT Ir )

−1Dγr αOLS = (Ir + κT (Dγ

r ))−1 αOLS

≈ (Ir − κTDγr )αOLS .

Page 33: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

RPC by SVT via Iterative Ridge

Given a m × n matrix Z , initialize a m × r matrix F = UDwhere U is orthonormal and D = Ir .

A. Repeat till convergence

i. (solve Λ given F ): Λ = Z ′F (F ′F + γIr )−1.

ii svd(Λ) = UΛDΛVΛ′, Λ = UΛDΛ. D = DΛ.

iii (solve F given Λ): F = ZΛ(Λ′Λ + γIr )−1.

iv svd(F ) = UF DF VF′, let F = UF DF and D = DF .

B. (Cleanup) From svd(ZUΛ) = UrDrVr′, let Vr

′ = V′Ur ,Dγ

r = (Dr − γIr )+.

Useful when T ,N are large and direct svd is expensive.

Iterative ridge regressions implement SVT.

Cleanup to take care of nuermical precision problem.

Page 34: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Generalized Ridge

General regularized problem :

(F γ1,γ2,τ ,Λγ1,γ2,τ ) = argminF ,Λ

1

2||Z−FΛ′||2F+

γ1

2||F ||2F+

γ2

2||Λ||2F .

Let Dγ

r = (Dr −√γ1γ2 Ir )+. Solution is

F γ1,γ2 =(γ2

γ1

)1/4

Ur (Dγ

r )1/2

Λγ1,γ2 =(γ1

γ2

)1/4

(Dγ

r )1/2

C γ1,γ2 = UrDγ

r V′r .

Page 35: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Monte Carlo

Xit = F 0′t Λ0

i + eit + sit , eit ∼ (0, 1)

sparse error sit ∼ N(µ, ω2) if (i , t) ∈ Ω.

[κNN] units have outliers in [κTT ] of sample.

(κN , κT ) = (0.1, 0.03), ω ∈ (5, 10, 20)µ = 5, r = 5.

DGP1 (outliers) : F 0t ∼ N(0, Ir ), Λ0

i ∼ N(0, Ir ).

DGP2 (weak loadings): F 0 = UrD1/2r , Λ0 = VrD

1/2r

diag(Dr ) = [1, 0.8, 0.5, 0.3, 0.2θ], and ω = 5.

θ (1, 0.75, 0.5).

Page 36: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Case 1: Outlier, ω = 5

0 50 100 150 200 250 300 350 400-4

-3

-2

-1

0

1

2

3

4

5

6

Page 37: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Case 2: Small Eigenvalue, θ = 0.75

1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

3.5

4104

Page 38: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Table 1: DGP 1, N = 100, r = 5, r∗ = 5params signal noise mean span F 0

T ,ω C r Cr S r r Cr C r C C100, 5 0.83 0.12 0.00 5.00 5.00 0.98 0.98 0.98 0.98100, 10 0.83 0.12 0.00 5.00 5.00 0.98 0.98 0.98 0.98100, 20 0.83 0.12 0.00 5.00 5.00 0.98 0.98 0.98 0.98200, 5 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98200, 10 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98200, 20 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98400, 5 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98400, 10 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98400, 20 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98

100 , 5 0.81 0.12 0.02 5.36 5.00 0.63 0.98 0.92 0.98100 ,10 0.78 0.12 0.06 5.79 5.00 0.28 0.98 0.85 0.97100 ,20 0.69 0.12 0.17 6.81 5.00 0.01 0.97 0.72 0.97200 , 5 0.81 0.13 0.02 5.67 5.00 0.32 0.98 0.87 0.98200 ,10 0.78 0.13 0.06 5.91 5.00 0.19 0.98 0.84 0.98200 ,20 0.69 0.13 0.17 7.13 5.00 0.00 0.98 0.69 0.98400 , 5 0.81 0.13 0.02 5.88 5.00 0.12 0.98 0.84 0.98400 ,10 0.78 0.13 0.06 5.90 5.00 0.16 0.98 0.84 0.98400 ,20 0.69 0.13 0.18 7.15 5.00 0.00 0.98 0.69 0.98

Page 39: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Table 2: DGP 2, N = 100, r = 5, r∗ = 3, ω = 5params signal noise mean span F 0

T , ω C r Cr S r r Cr C r C C100, 1.00 0.67 0.02 0.00 3.94 3.00 0.07 0.95 0.74 0.96100, 0.75 0.67 0.01 0.00 3.95 3.00 0.05 0.95 0.73 0.96100, 0.50 0.67 0.01 0.00 3.97 3.00 0.04 0.95 0.73 0.96200, 1.00 0.67 0.02 0.00 4.01 3.00 0.00 0.95 0.73 0.97200, 0.75 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97200, 0.50 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97400, 1.00 0.67 0.02 0.00 4.26 3.00 0.00 0.95 0.69 0.97400, 0.75 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97400, 0.50 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97

100 ,1.00 0.60 0.02 0.11 4.81 2.93 0.01 0.93 0.61 0.96100 ,0.75 0.59 0.01 0.11 4.84 2.95 0.01 0.93 0.60 0.96100 ,0.50 0.59 0.01 0.11 4.86 2.96 0.01 0.93 0.60 0.96200 ,1.00 0.60 0.02 0.11 5.01 3.00 0.01 0.93 0.58 0.96200 ,0.75 0.59 0.01 0.11 5.00 3.01 0.01 0.93 0.58 0.96200 ,0.50 0.59 0.01 0.11 5.00 3.01 0.01 0.93 0.58 0.96400 ,1.00 0.60 0.02 0.11 5.21 3.10 0.00 0.84 0.56 0.94400 ,0.75 0.59 0.01 0.11 5.00 3.12 0.00 0.83 0.58 0.94400 ,0.50 0.59 0.01 0.11 5.00 3.13 0.00 0.82 0.58 0.93

Page 40: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

FRED-MD Data

Eigenvalues

F Balanced Panel Non-Balanced Panel

d21 d

2

1 d21 d

2

1

1 0.1828 0.1426 0.1493 0.11312 0.0921 0.0643 0.0709 0.04683 0.0716 0.0473 0.0682 0.04464 0.0604 0.0384 0.0561 0.03495 0.0453 0.0265 0.0426 0.02456 0.0416 0.0237 0.0341 0.01827 0.0301 0.0152 0.0317 0.01648 0.0287 0.0143 0.0268 0.0129(r , r) 8 3 8 3

Page 41: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Financial Data

Eigenvalues

F Balanced Panel Non-Balanced Panel

d21 d

2

1 d21 d

2

1

1 0.6896 0.6090 0.6800 0.60012 0.0464 0.0274 0.0447 0.02613 0.0341 0.0181 0.0337 0.01784 0.0138 0.0045 0.0141 0.00475 0.0114 0.0032 0.0133 0.00436 0.0092 0.0021 0.0109 0.00307 0.0072 0.0012 0.0090 0.00208 0.0066 0.0010 0.0075 0.0013(r , r) 8 3 8 3

Page 42: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Linear Restrictions: : Rvec(Λ) = φ

(F γ,τ ,Λγ,τ ) = argminF ,Λ

1

2‖Z − FΛ′‖2

F +γ

2

(‖F‖2

F + ‖Λ‖2F

)+τ

2‖R vec(Λ)− φ‖2

2 .

Vector form: ‖Z − FΛ′‖2F = ‖vec(Z ′)− (F ⊗ IN)vec(Λ)‖2

2.

Allow cross-equation restrictions.

F given Λ: F γ,τ = ZΛ(Λ′Λ + γIr )−1 (standard ridge)

Λ given F : (generalized ridge)

vec(Λγ,τ ) =(

(F ′F ⊗ IN) + γINr + τR ′R)−1[

vec(Z ′F ) + τR ′φ]

Page 43: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Implementation when constraints bind

Let WF = (F ′F + γIr )−1.

vec(Λγ,∞) = vec(Λγ,0)− (WF ⊗ IN)R ′ ·[R(WF ⊗ IN)R ′

]−1(R vec(Λγ,0)− φ

)Two Step Approach

1 Estimate without linear restrictions, τ = 0:

Λγ,0 = Z ′F k(F kT

F k + γIr )−1.

2 Impose binding linear restrictions :

vec(Λγ,∞) = vec(Λγ,0)−W kF⊗INR ′·

[R(W k

F⊗IN)R ′]−1(

R vec(Λγ,0)−φ)

Note: F′γ,∞Fγ,∞ and Λ

′γ,∞Λγ,∞ will not , in general, be diagonal.

Page 44: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Conclusion

Iterative least squares: PC

Iterative ridge: implements SVT

SVT solves surragate of minimum rank problem.

min rank +parsimony: ⇒ IC , a data dependent penalty.

FRED-MD, Finance data: r = 8, r = 3.

Factor estimation under linear restrictions

Missing values problem: in progress

Page 45: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Incoherence Conditions

U ∈ RT×r , V ∈ RN×r

Single incoherence: singular vectors not too skewed:maxi=1,...T ||U ′ei ||2 ≤ µ0r

T, and maxj=1,...N ||V ′ej ||2 ≤ µ0r

N

Joint incoherence: singular vectors not too correlated:maxi ,j ||(UV T )ij || ≤

õ1rNT

Singular vectors are reasonably spread out for small µ.∑i(UV

′)ij = ||V ′ej ||22 and∑

j(UV′)2ij = ||U ′ei ||22.

µ1 dominates.

Page 46: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Example when incoherence condition fails:

Z =

1 0 00 0 00 0 0

=

100

[1](1 0 0

)

Z too sparse, and singular vectors too sparse.

Completion requires many entries of Z to be observed.

Back to Main Text

Page 47: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Rank easy to compute, spark needs combinatorial search.

Spark(A)≤ m + 1 =rank(A)+1.

Donoho and Elad (2003): spark(A) ≥ 1 + µ−1(A).

Stable L1 recovery: min ||x ||1 s.t. ||Ax − b||2 ≤ ε.

Coherence-base guarantee: if A has normalized columns

and Ax = b has solution satisfying ||x ||0 < (1+µ−1(A))2

,then x is the unique sparse solution.

Page 48: Rank Regularized Estimation of Approximate Factor …sn2294/papers/baing17_slides.pdfApproximate Factor ModelsRank Minimization: NP HardApproximate-Rank MinimizationRank Regularized

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Restricted Isometry Property: a m × n matrix A satisfiesthe RIP of order k if

(1− δk)||z ||22 ≤ ||Az ||22 ≤ (1 + δk)||z ||22, ||z ||0 ≤ k .

RIP ensures that the matrix is property scaled.

Statistical RIP property: P(||Ax ||2 − ||x ||2) ≥ 1− ε withrespect to a uniform distribution of vector x among all ksparse in Rn.