rank regularized estimation of approximate factor …sn2294/papers/baing17_slides.pdfapproximate...

Rank Regularized Estimation of Approximate

Factor Models

Jushan Bai Serena Ng

Columbia University

April 2018

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor Models

Outline

1 Approximate Factor ModelsAPC vs PC

2 Rank Minimization: NP Hard

3 Approximate-Rank MinimizationRPC vs PC

4 Rank Regularized Factor ModelsNumber of FactorsLinear Restrictions


Overview: Model: X = FΛ′ + e

APC: asymptotic principal components F =√TUr

eigenvectors can be constructed by iterative OLS.

What if we do iterative ridge regressions instead of OLS?

Singular value thresholding ⇒ robust PCRegularize rank of common componentAlgorithmic view: finite sample error bounds.

This paper: rank regularized factor analysis

Parametric analysis, asymptotic results for inference.A new, conservative factor selection rule.(*) Factor analysis under general linear restrictions.

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsAPC vs PC

Notation

Xit ∼ (0, 1), i = 1, . . .N , t = 1, . . .T .

svd: X = UDV ′ = Ur DrVr′ + Un−r Dn−rVn−r

′

Normalized Data: Z = X√NT

= UDV ′, D = D√NT

Unscaled model: X = F 0Λ0′ + e.

Scaled model: Z = F ∗Λ∗′ + e∗

F ∗ = F 0√T

, Λ∗ = Λ0√N

.


Asymptotic Principal Components: (APC)

minF ,Λ1

NT‖X − FΛ′‖2

F assuming strong factor structure

ΣF > 0,ΣΛ > 0; e weakly correlated.

(F , Λ) = (√TUr ,VrDr ), F ′F

T= Ir ,

Λ′ΛN

= D2r .

(Bai 2003): Under normalization F ′FT

= Ir or Λ′ΛN

= Ir ,

√N(Ft − H ′NTF

0t )

d−→ N (0,Avar(Ft))√T (Λi − GNTΛ0

i )d−→ N (0,Avar(Λt)).

with G = H−1NT .


Lemma

Rotation matrix: HNT =

(Λ0′Λ0

N

)(F

0′F

T

)D−2

r .

Let H1,NT = (Λ0′Λ0)(Λ′Λ0)−1;

HNT = H1,NT + op(1).

Let H2,NT = (F 0′F 0)−1(F 0′F );

HNT = H2,NT + op(1).

Results of independent interest:


Principal Components (PC)

Recall APC: (F , Λ) = (√TUr ,VrDr ), F ′F

T= Ir ,

Λ′ΛN

= D2r .

Many definitions of PC: e.g. (F , Λ) = (√TUrDr ,Vr ).

This paper defines PC:(F , Λ) = (

√TUrD

1/2r ,√NVrD

1/2r ) = (

√TFz ,

√NΛz).

Normalization: F ′FT

= Dr ,Λ′ΛN

= Dr .

Why? F ′FT

= Ir . Not convenient to put restrictions.


Relation with APC: F = FD1/2r Λ = ΛD

−1/2r .

Define HNT = HNTD1/2r . From identities:

√N(Ft − H ′NTF

0t ) =

√ND1/2

r (Ft − HNT′F 0

t ),√T (Λi − H−1

NTΛ0i ) =

√TD−1/2

r (Λi − H−1NTΛ0

i ).

Asymptotic properties:

(i)√N(Ft − H ′NTF

0t )

d−→N

(0, D1/2

r Avar(Ft)D1/2r

);

(ii)√T (Λi − GNTΛ0

i )d−→N

(0, D−1/2

r Avar(Λi )D−1/2r

).

with GNT = H−1NT .


1 Approximate Factor ModelsAPC vs PC

2 Rank Minimization: NP Hard

3 Approximate-Rank MinimizationRPC vs PC

4 Rank Regularized Factor ModelsNumber of FactorsLinear Restrictions


Let A be a n × n matrix with eigenvalues in D = diag(d):

Trace norm:∑n

k=1 Akk =∑n

k=1 dk

Nuclear norm: ||A||∗ =∑n

k=1 dk

Frobenius norm: ||A||2F =∑

ij A2ij = trace(A′A).

`1 norm: ||A||1 =∑

ij |Aij |Spectral norm: ||A||2 = maxk |dk |.


Spark vs Rank

For A ∈ Rm×n, n < m

spark(A) = minx 6=0||x ||0 s.t. Ax = 0

rank(A) = ||D||0 = nnz(D).

spark(A) = size of smallest set of lin. dep. columns.

rank(A)= size of largest set of lin. indep. columns.

spark(A) = n + 1⇔ rank(A) = n.

If spark(A) 6= n + 1:

spark(A) ≤ rank(A).spark(A) ≥ 1 + 1

µ(A) , µ(A) = maxm 6=n(am, an)|.

Computing spark(A) is NP-hard: Tillmann/Pfetsch IEEE-14


NP Hard

NP problems: decision problems in which the answer”yes” can be efficiently verified using deterministiccomputations performed in polynomial time.

An NP hard problem is one that admits no generalcomputational solution that is significantly faster than abrute force search.


1. Minimum Rank Factor Analysis

Early factor analysis: decompose ΣX = ΣC + Σe s.t.

i commonality matrix ΣC has smallest rankii ΣC : a non-negative definite,iii Σe : diagonal positive definite matrix (Haywood cases).

Rank minimization is NP hard (non-convexity),

Evidence in 1950s suggest many non-zero eigenvalues.Questioned usefulness of the concept of minimum rank.


1980s: Decompose ΣX by solving surrogate problems s.t.

(i) ΣX − Σe ≥ 0 (ii) Σe ≥ 0.

(i) CMTFA: min trace(ΣX − Σe)=∑N

i=1 DCii

(ii) MARFA: C = C ∗ + C−.

C ∗ is best minimum rank approximation of C .rank(C ∗) = r , min

∑Ni=r+1 D

Cii


Approximate Minimum Rank: ten Berge-Kiers (1991)

minr

N∑i=r+1

DCii ≤ δ, s.t. (i)+(ii). (∗).

δ: tolerance for max unexplained common variance.

The approximate minimum rank of ΣC is the smallest rthat solves (*) for some δ ≥ 0.

Minimum rank: special case of δ = 0.

Sum of eigenvalues is convex.


2. Matrix Completion

Complete the matrix Z with missing values.

Ω = index set of positions of observed data.

Underdetermined without some structure.

Assume the latent matrix L is low rank.

Netflix challenge: L = AB ′, A=movie genres, B=taste

Hard problem:

min rank(L) with Lij = Zij (i , j) ∈ Ω.

Surrogate problem:

min ‖L‖∗ with Lij = Zij , (i , j) ∈ Ω.

Z can be recovered if (i) there are not too many missingvalues, and (ii) they are missing at random.


3. Low Rank Decomposition

Eckart-Young: Best rank r approximation of Z : UrDrV′r .

svd is sensitive to noise corruption.

Z = L︸︷︷︸low rank

+ S︸︷︷︸sparse, big noise

Compressed sensing: solve underdetermined systems,recover sparse signalsComputer vision: S=background noise.

Hard problem:

minL,S

rank(L) + γ ||S ||0.︸︷︷︸sparsity constraint

Objective function and constraint both non-convex.


Candes et al (2009)

Surrogate problem is convex:

minL,S||L||∗ + γ||S ||1,

L, S can be recovered with high probability under

incoherence conditions: L not sparse, S not low rank,

General problem

Z = L︸︷︷︸low rank

+ S︸︷︷︸sparse, big noise

+ W︸︷︷︸small noise

minL,S||L||∗ + γ||S ||1, ||W ||F ≤ δ.


Overview: Good to Relax

Hard problems: rank function.

Surrogate problems: nuclear norm

Cai et al (2008, Theorem 1):

UrDγr V′r = argminLγ ‖L‖∗ +

1

2‖Z − L‖2

F .

SVT=Singular-value thresholding operator:

Dγr =

(D11 − γ)+

. . .(Drr − γ)+)

SVT is the proximal operator of the nuclear norm:

Optimal low rank approx. under rank constraint: UrDγr Vr .

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsRPC vs PC

Relation to Factor Models

We have low rank solution

UrDγr V′r = min

Lγ ‖L‖∗ +

1

2‖Z − L‖2

F . (1).

L (of rank r) can be factorized: L = AB ′,

minA,B

γ ‖AB ′‖∗ +1

2‖Z − AB ′‖2

F (2)

Theorem: (A,B) solves (2) iff L = A B ′ solves (1).

Solution: Robust Principal Components (RPCA)

A = Ur (Dγr )1/2 B = Vr (D

γr )1/2.


Sketch of idea, γ = 0: AB ′ = UrDrVTr :

trace(Dr ) = trace(Ur′AB ′Vr ) ≤ ‖A‖F ‖B‖F ≤

1

2

(‖A‖2

F + ‖B‖2F

).

L = AB ′, ‖L‖∗ = trace(Dr ) ≤ 12(‖A‖2

F + ‖B‖2F ).

Put, A = UD1/2r ,B = D

1/2r V ,

1

2(‖A‖2

F + ‖B‖2F ) =

1

2(||D1/2

r ||2F + ||D1/2r ||2F ) = ‖Dr‖1 .

Bound holds with equality: ‖Dr‖1 = 12(‖A‖2

F + ‖B‖2F ).


FOC View

FOC: (i) −(Z − AB′)B + γA = 0, (ii) −(Z − AB

′)′A + γB = 0.

Left multiplying (i) by A′ and (ii) by B ′: A′A = B ′B. Rearrange(−γI ZZ ′ −γI

)(AB

)=

(AB

)A′A.

This has the generic structure ZV = VX.

Eigenvalues of X are those of Z, V are corresponding eigenvectors.

A = Ur (Dγr )1/2, B = Vr (D

γr )1/2.

A particular normalization.


Factor Analysis and RPC

With A′A = B

′B = D

γ

r .

RPC of Z : (A,B) = (Ur (Dγr )1/2,Vr (D

γr )1/2)

RPC of X : (F ,Λ) = (√TUr (D

γr )1/2,

√NVr (D

γr )1/2).

PC of X : (F , Λ) = (√TUr (Dr )

1/2,√NVr (Dr )

1/2).

Relation between RPC and PC:

F = F

(Dγ

r D−1r

)1/2

Λ = Λ

(Dγ

r D−1r

)1/2

.

Even big factors will be shrunk

Small factors can be killed since rank(Dγr ) ≤ r

Sparse large noise not treated as factors

Smaller common component: var(C ) ≤ var(C ).


Effects of Regularization: ∆2NT = Dγ

r D−1r .

HNT = HNT∆NT .

F t − H′NTF

0t = ∆NT (Ft − H ′NTF

0t )

Λi − GNTΛ0i = ∆NT (Λi − H−1

NTΛ0i )

Proposition

(i)√N(F t − H

′NTF

0t )

d−→N

(0,∆∞Avar(F )∆∞

);

(ii)√T (Λi − GNTΛ0

i )d−→N

(0,∆∞Avar(Λ)∆∞

).

Unlike APC and PC, GNT = ∆NT H−1NT 6= H−1

NT .


Bias/Variance Tradeoff

diag(∆∞) = δ, δi < 1. Proposition implies

Avar(F ) ≤ Avar(F ), and Avar(Λ) ≤ Avar(Λ).

Regularization bias since C = UrDrVr′ 6= C = UrD

γr Vr

′.

Case r = 1: δ1 = (D11−γ)+

D11, C it = δ1Cit .

Abias(C it) = (δ1 − 1)C 0it

Avar(C it) = δ21Amse(Cit)

Amse(C it) = (δ1 − 1)2(C 0it)

2 + δ21Amse(Cit).

Relative MSE < 1 when Amse(Cit) large:

Amse(C it)

Amse(Cit)= (δ1 − 1)2 (C 0

it)2

Amse(Cit)+ δ2

1.


Asymptotic vs. Finite Sample Results

Z = L + S consistent with many probabilistic structure

Econometric theory: X = F 0Λ0′ + e, Z = X√NT

Strong factor structure ΣF > 0,ΣΛ > 0

r population eigenvalues diverge with N

Estimation: choose F ,Λ with e residually determined.

min(√N ,√T )(Cit − C 0

it)d−→N(0,Avar(Cit)).


Machine Learning Results:

Solve problem given data (finite sample).

Choose L and S simultaneously.

Netflix/noiseless problems: no reference to eigenvalues.

Incoherence condition: L is not sparse. Details

S is selected uniformly at random and not low rank.

For γ = 1√max(m,n)

, (L, S) = (L, S) with prob. 1− c0

n10 if

||S ||0 < c1mn, and rank (L) ≤ c1min(m,n)

µ log(max(m, n))−2.


Agarwal. Negahban and Wainwright (2012, Annals of Statistics)

M estimation based on regularized nuclear norm. Assumerestricted strong convexity of loss function.

With noisy data, cannot exactly recovery L.

What matters are eigenvectors of largest singular values.

err2 = ||L− L||2F + ||S − S ||2F

if ||L||∞ < c√m n

, with high probability, err2 ≤ c

(N+TNT

).


||L||∞ = maxit |Lit |, ||L||2F =∑r

i=1 d2i .

||L||∞ < cmn

is a constraint on sum of eigenvalues.

N+TNT≈ min(N ,T )−1.

Econometric theory: min(√N ,√T )(Cit − Cit) = Op(1).

Different objective, results broadly agree.

Also related: Bertisimas, Copenhaver, Mazumder (2016),Lettau and Pelger (2017).

Approximate Factor Models Rank Minimization: NP Hard Approximate-Rank Minimization Rank Regularized Factor ModelsNumber of Factors Linear Restrictions

Number of Factors: min rank+model complexity

BaiNg-02 r = mink

log(ssrk) + kg(N,T ), ssrk =∥∥∥Z − Fk Λ′k

∥∥∥2

F

BaiNg-17 r = mink

log(ssrk) + kg(N,T ), ssrk =∥∥∥Z − F kΛ

′k

∥∥∥2

F

ssrk = 1−k∑

j=1

d2j , ssrk = 1−

k∑j=1

(dj − γ)2+

ICk ≈ ICk + γ

∑kj=1(2dj − γ)

ssrk.

A data dependent, heavier penalty.r ≥ r ∗: sparse outliers or weak factors.||Z ||F = 1. γ = .05 reduces contribution of factor i by(di − .05)2. Effect on small factors proportionally larger.


Implications for Factor Augmented Regressions

yt+h = α′Ft + β′Wt + εt+h.

Replace F by F , F , or F will give identical fit! They areall spanned by Ur , hence perfectly correlated.

The estimates of α will simply adjust for scale difference.

For F to have effect, do ridge regressions. Given κ,

αols = (F ′F )−1Fy = (Dγr )−1/2U ′y/

√T

αR = (F ′F + κIr )−1Fy

= (Dγr + κT Ir )

−1Dγr αOLS = (Ir + κT (Dγ

r ))−1 αOLS

≈ (Ir − κTDγr )αOLS .


RPC by SVT via Iterative Ridge

Given a m × n matrix Z , initialize a m × r matrix F = UDwhere U is orthonormal and D = Ir .

A. Repeat till convergence

i. (solve Λ given F ): Λ = Z ′F (F ′F + γIr )−1.

ii svd(Λ) = UΛDΛVΛ′, Λ = UΛDΛ. D = DΛ.

iii (solve F given Λ): F = ZΛ(Λ′Λ + γIr )−1.

iv svd(F ) = UF DF VF′, let F = UF DF and D = DF .

B. (Cleanup) From svd(ZUΛ) = UrDrVr′, let Vr

′ = V′Ur ,Dγ

r = (Dr − γIr )+.

Useful when T ,N are large and direct svd is expensive.

Iterative ridge regressions implement SVT.

Cleanup to take care of nuermical precision problem.


Generalized Ridge

General regularized problem :

(F γ1,γ2,τ ,Λγ1,γ2,τ ) = argminF ,Λ

1

2||Z−FΛ′||2F+

γ1

2||F ||2F+

γ2

2||Λ||2F .

Let Dγ

r = (Dr −√γ1γ2 Ir )+. Solution is

F γ1,γ2 =(γ2

γ1

)1/4

Ur (Dγ

r )1/2

Λγ1,γ2 =(γ1

γ2

)1/4

(Dγ

r )1/2

C γ1,γ2 = UrDγ

r V′r .


Monte Carlo

Xit = F 0′t Λ0

i + eit + sit , eit ∼ (0, 1)

sparse error sit ∼ N(µ, ω2) if (i , t) ∈ Ω.

[κNN] units have outliers in [κTT ] of sample.

(κN , κT ) = (0.1, 0.03), ω ∈ (5, 10, 20)µ = 5, r = 5.

DGP1 (outliers) : F 0t ∼ N(0, Ir ), Λ0

i ∼ N(0, Ir ).

DGP2 (weak loadings): F 0 = UrD1/2r , Λ0 = VrD

1/2r

diag(Dr ) = [1, 0.8, 0.5, 0.3, 0.2θ], and ω = 5.

θ (1, 0.75, 0.5).


Case 1: Outlier, ω = 5

0 50 100 150 200 250 300 350 400-4

-3

-2

-1

0

1

2

3

4

5

6


Case 2: Small Eigenvalue, θ = 0.75

1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

3.5

4104


Table 1: DGP 1, N = 100, r = 5, r∗ = 5params signal noise mean span F 0

T ,ω C r Cr S r r Cr C r C C100, 5 0.83 0.12 0.00 5.00 5.00 0.98 0.98 0.98 0.98100, 10 0.83 0.12 0.00 5.00 5.00 0.98 0.98 0.98 0.98100, 20 0.83 0.12 0.00 5.00 5.00 0.98 0.98 0.98 0.98200, 5 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98200, 10 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98200, 20 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98400, 5 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98400, 10 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98400, 20 0.83 0.13 0.00 5.00 5.00 0.98 0.98 0.98 0.98

100 , 5 0.81 0.12 0.02 5.36 5.00 0.63 0.98 0.92 0.98100 ,10 0.78 0.12 0.06 5.79 5.00 0.28 0.98 0.85 0.97100 ,20 0.69 0.12 0.17 6.81 5.00 0.01 0.97 0.72 0.97200 , 5 0.81 0.13 0.02 5.67 5.00 0.32 0.98 0.87 0.98200 ,10 0.78 0.13 0.06 5.91 5.00 0.19 0.98 0.84 0.98200 ,20 0.69 0.13 0.17 7.13 5.00 0.00 0.98 0.69 0.98400 , 5 0.81 0.13 0.02 5.88 5.00 0.12 0.98 0.84 0.98400 ,10 0.78 0.13 0.06 5.90 5.00 0.16 0.98 0.84 0.98400 ,20 0.69 0.13 0.18 7.15 5.00 0.00 0.98 0.69 0.98


Table 2: DGP 2, N = 100, r = 5, r∗ = 3, ω = 5params signal noise mean span F 0

T , ω C r Cr S r r Cr C r C C100, 1.00 0.67 0.02 0.00 3.94 3.00 0.07 0.95 0.74 0.96100, 0.75 0.67 0.01 0.00 3.95 3.00 0.05 0.95 0.73 0.96100, 0.50 0.67 0.01 0.00 3.97 3.00 0.04 0.95 0.73 0.96200, 1.00 0.67 0.02 0.00 4.01 3.00 0.00 0.95 0.73 0.97200, 0.75 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97200, 0.50 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97400, 1.00 0.67 0.02 0.00 4.26 3.00 0.00 0.95 0.69 0.97400, 0.75 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97400, 0.50 0.67 0.01 0.00 4.00 3.00 0.00 0.95 0.73 0.97

100 ,1.00 0.60 0.02 0.11 4.81 2.93 0.01 0.93 0.61 0.96100 ,0.75 0.59 0.01 0.11 4.84 2.95 0.01 0.93 0.60 0.96100 ,0.50 0.59 0.01 0.11 4.86 2.96 0.01 0.93 0.60 0.96200 ,1.00 0.60 0.02 0.11 5.01 3.00 0.01 0.93 0.58 0.96200 ,0.75 0.59 0.01 0.11 5.00 3.01 0.01 0.93 0.58 0.96200 ,0.50 0.59 0.01 0.11 5.00 3.01 0.01 0.93 0.58 0.96400 ,1.00 0.60 0.02 0.11 5.21 3.10 0.00 0.84 0.56 0.94400 ,0.75 0.59 0.01 0.11 5.00 3.12 0.00 0.83 0.58 0.94400 ,0.50 0.59 0.01 0.11 5.00 3.13 0.00 0.82 0.58 0.93


FRED-MD Data

Eigenvalues

F Balanced Panel Non-Balanced Panel

d21 d

2

1 d21 d

2

1

1 0.1828 0.1426 0.1493 0.11312 0.0921 0.0643 0.0709 0.04683 0.0716 0.0473 0.0682 0.04464 0.0604 0.0384 0.0561 0.03495 0.0453 0.0265 0.0426 0.02456 0.0416 0.0237 0.0341 0.01827 0.0301 0.0152 0.0317 0.01648 0.0287 0.0143 0.0268 0.0129(r , r) 8 3 8 3


Financial Data

Eigenvalues

F Balanced Panel Non-Balanced Panel

d21 d

2

1 d21 d

2

1

1 0.6896 0.6090 0.6800 0.60012 0.0464 0.0274 0.0447 0.02613 0.0341 0.0181 0.0337 0.01784 0.0138 0.0045 0.0141 0.00475 0.0114 0.0032 0.0133 0.00436 0.0092 0.0021 0.0109 0.00307 0.0072 0.0012 0.0090 0.00208 0.0066 0.0010 0.0075 0.0013(r , r) 8 3 8 3


Linear Restrictions: : Rvec(Λ) = φ

(F γ,τ ,Λγ,τ ) = argminF ,Λ

1

2‖Z − FΛ′‖2

F +γ

2

(‖F‖2

F + ‖Λ‖2F

)+τ

2‖R vec(Λ)− φ‖2

2 .

Vector form: ‖Z − FΛ′‖2F = ‖vec(Z ′)− (F ⊗ IN)vec(Λ)‖2

2.

Allow cross-equation restrictions.

F given Λ: F γ,τ = ZΛ(Λ′Λ + γIr )−1 (standard ridge)

Λ given F : (generalized ridge)

vec(Λγ,τ ) =(

(F ′F ⊗ IN) + γINr + τR ′R)−1[

vec(Z ′F ) + τR ′φ]


Implementation when constraints bind

Let WF = (F ′F + γIr )−1.

vec(Λγ,∞) = vec(Λγ,0)− (WF ⊗ IN)R ′ ·[R(WF ⊗ IN)R ′

]−1(R vec(Λγ,0)− φ

)Two Step Approach

1 Estimate without linear restrictions, τ = 0:

Λγ,0 = Z ′F k(F kT

F k + γIr )−1.

2 Impose binding linear restrictions :

vec(Λγ,∞) = vec(Λγ,0)−W kF⊗INR ′·

[R(W k

F⊗IN)R ′]−1(

R vec(Λγ,0)−φ)

Note: F′γ,∞Fγ,∞ and Λ

′γ,∞Λγ,∞ will not , in general, be diagonal.


Conclusion

Iterative least squares: PC

Iterative ridge: implements SVT

SVT solves surragate of minimum rank problem.

min rank +parsimony: ⇒ IC , a data dependent penalty.

FRED-MD, Finance data: r = 8, r = 3.

Factor estimation under linear restrictions

Missing values problem: in progress


Incoherence Conditions

U ∈ RT×r , V ∈ RN×r

Single incoherence: singular vectors not too skewed:maxi=1,...T ||U ′ei ||2 ≤ µ0r

T, and maxj=1,...N ||V ′ej ||2 ≤ µ0r

N

Joint incoherence: singular vectors not too correlated:maxi ,j ||(UV T )ij || ≤

√µ1rNT

Singular vectors are reasonably spread out for small µ.∑i(UV

′)ij = ||V ′ej ||22 and∑

j(UV′)2ij = ||U ′ei ||22.

µ1 dominates.


Example when incoherence condition fails:

Z =

1 0 00 0 00 0 0

=

100

[1](1 0 0

)

Z too sparse, and singular vectors too sparse.

Completion requires many entries of Z to be observed.

Back to Main Text


Rank easy to compute, spark needs combinatorial search.

Spark(A)≤ m + 1 =rank(A)+1.

Donoho and Elad (2003): spark(A) ≥ 1 + µ−1(A).

Stable L1 recovery: min ||x ||1 s.t. ||Ax − b||2 ≤ ε.

Coherence-base guarantee: if A has normalized columns

and Ax = b has solution satisfying ||x ||0 < (1+µ−1(A))2

,then x is the unique sparse solution.


Restricted Isometry Property: a m × n matrix A satisfiesthe RIP of order k if

(1− δk)||z ||22 ≤ ||Az ||22 ≤ (1 + δk)||z ||22, ||z ||0 ≤ k .

RIP ensures that the matrix is property scaled.

Statistical RIP property: P(||Ax ||2 − ||x ||2) ≥ 1− ε withrespect to a uniform distribution of vector x among all ksparse in Rn.

rank regularized estimation of approximate factor …sn2294/papers/baing17_slides.pdfapproximate...

Documents