robust principal component analysisyc5/ele520_math_data/lectures/robust_pca.pdf1 robust pca 14-12....

29
ELE 520: Mathematics of Data Science Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2020

Upload: others

Post on 27-Feb-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

ELE 520: Mathematics of Data Science

Robust Principal Component Analysis

Yuxin Chen

Princeton University, Fall 2020

Page 2: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Disentangling sparse and low-rank matrices

Suppose we are given a matrix

M = L︸︷︷︸low-rank

+ S︸︷︷︸sparse

∈ Rn×n

Question: can we hope to recover both L and S from M?

Robust PCA 14-2

Page 3: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Principal component analysis (PCA)

• N samples X = [x1,x2, . . . ,xN ] ∈ Rn×N that are centered

• PCA: seeks r directions that explain most variance of data

minimizeL:rank(L)=r ‖X −L‖F

best rank-r approximation of X

Robust PCA 14-3

Page 4: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Sensitivity to corruptions / outliers

What if some samples are corrupted (e.g. due to sensorerrors / attacks)?

Classical PCA fails even with a few outliers

Robust PCA 14-4

Page 5: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Video surveillanceSeparation of background (low-rank) and foreground (sparse)

Candes, Li, Ma, Wright ’11Robust PCA 14-5

Page 6: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Graph clustering / community recovery

• n nodes, 2 (or more) clusters

• A friendship graph G: for any pair (i, j),

Mi,j =

1, if (i, j) ∈ G0, else

• Edge density within clusters > edge density across clusters

• Goal: recover cluster structure

Robust PCA 14-6

Page 7: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Graph clustering / community recovery

M = L︸︷︷︸low-rank

+ M −L︸ ︷︷ ︸sparse

• An equivalent goal: recover the ground truth matrix

Li,j =

1, if i and j are in the same community0, else

• Clustering ⇐⇒ robust PCA

Robust PCA 14-7

Page 8: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Gaussian graphical models

Fact 14.1Consider a Gaussian vector x ∼ N (0,Σ). For any u and v,

xu ⊥⊥ xv | xV\u,v

iff Θu,v = 0, where Θ = Σ−1 is the inverse covariance matrix

conditional independence ⇐⇒ sparsity

Robust PCA 14-8

Page 9: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Gaussian graphical models

∗ ∗ 0 0 ∗ 0 0 0∗ ∗ 0 0 0 ∗ ∗ 00 0 ∗ 0 ∗ 0 0 ∗0 0 0 ∗ 0 0 ∗ 0∗ 0 ∗ 0 ∗ 0 0 ∗0 ∗ 0 0 0 ∗ 0 00 ∗ 0 ∗ 0 0 ∗ 00 0 ∗ 0 ∗ 0 0 ∗

︸ ︷︷ ︸

Θ

The inverse covariance matrix Θ is often sparse

Robust PCA 14-9

Page 10: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Graphical models with latent factors

What if one only observes a subset of variables?

[xoxh

](observed variables)(hidden variables)

xo = [x1, · · · , x6]>,xh = [x7, x8]>

The covariance and precision matrices can be partitioned as

Σ =

observed part︷︸︸︷Σo Σo,h

Σ>o,h Σh

=[

Θo Θo,hΘ>o,h Θh

]−1

Σ−1o︸︷︷︸

observed

= Θo︸︷︷︸sparse

− Θo,hΘ−1h Θh,o︸ ︷︷ ︸

low-rank if # latent vars is small

sparse + low-rank decomposition

Robust PCA 14-10

Page 11: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Graphical models with latent factors

What if one only observes a subset of variables?

[xoxh

](observed variables)(hidden variables)

xo = [x1, · · · , x6]>,xh = [x7, x8]>

The covariance and precision matrices can be partitioned as

Σ =

observed part︷︸︸︷Σo Σo,h

Σ>o,h Σh

=[

Θo Θo,hΘ>o,h Θh

]−1

Σ−1o︸︷︷︸

observed

= Θo︸︷︷︸sparse

− Θo,hΘ−1h Θh,o︸ ︷︷ ︸

low-rank if # latent vars is small

sparse + low-rank decomposition

Robust PCA 14-10

Page 12: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

When is decomposition possible?

Identifiability issue: a matrix might be simultaneously low-rank andsparse!

1 0 0 · · · 00 0 0 · · · 0...

...... . . . ...

0 0 0 · · · 0

︸ ︷︷ ︸

sparse and low-rank

vs.

1 0 1 · · · 10 1 0 · · · 0...

...... . . . ...

1 0 0 · · · 1

︸ ︷︷ ︸sparse but not low-rank

Nonzero entries of sparse component need to be spread out— This lecture: assume locations of the nonzero entries are

random

1 1 1 · · · 11 1 1 · · · 1...

......

1 1 1 · · · 1

︸ ︷︷ ︸

low-rank and dense

vs.

1 1 1 · · · 10 0 0 · · · 0...

......

0 0 0 · · · 0

︸ ︷︷ ︸

low-rank but sparse

The low-rank component needs to be incoherent

Robust PCA 14-11

Page 13: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

When is decomposition possible?

Identifiability issue: a matrix might be simultaneously low-rank andsparse!

1 0 0 · · · 00 0 0 · · · 0...

...... . . . ...

0 0 0 · · · 0

︸ ︷︷ ︸

sparse and low-rank

vs.

1 0 1 · · · 10 1 0 · · · 0...

...... . . . ...

1 0 0 · · · 1

︸ ︷︷ ︸sparse but not low-rank

Nonzero entries of sparse component need to be spread out— This lecture: assume locations of the nonzero entries are

random

1 1 1 · · · 11 1 1 · · · 1...

......

1 1 1 · · · 1

︸ ︷︷ ︸

low-rank and dense

vs.

1 1 1 · · · 10 0 0 · · · 0...

......

0 0 0 · · · 0

︸ ︷︷ ︸

low-rank but sparse

The low-rank component needs to be incoherent

Robust PCA 14-11

Page 14: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Low-rank component: coherence

Definition 14.2Coherence parameter µ1 of M = UΣV > is the smallest quantity s.t.

maxi‖U>ei‖22 ≤

µ1r

nand max

i‖V >ei‖22 ≤

µ1r

n

(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m

X 0

m

(dual) maximizey b>y

s.t.mX

i=1

yiAi + S = C

S 0

A (X) =

26664

hA1, XihA2, Xi

...hAm, Xi

37775 =

26664

ha1a>1 , Xi

ha2a>2 , Xi...

hama>m, Xi

37775

ei U PUei

1

(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m

X 0

m

(dual) maximizey b>y

s.t.mX

i=1

yiAi + S = C

S 0

A (X) =

26664

hA1, XihA2, Xi

...hAm, Xi

37775 =

26664

ha1a>1 , Xi

ha2a>2 , Xi...

hama>m, Xi

37775

ei U PUei

1

(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m

X 0

m

(dual) maximizey b>y

s.t.mX

i=1

yiAi + S = C

S 0

A (X) =

26664

hA1, XihA2, Xi

...hAm, Xi

37775 =

26664

ha1a>1 , Xi

ha2a>2 , Xi...

hama>m, Xi

37775

ei U PU (ei)

1

Robust PCA 14-12

Page 15: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Low-rank component: joint coherence

Definition 14.3 (Joint coherence)Joint coherence parameter µ2 of M = UΣV > is the smallestquantity s.t.

‖UV >‖∞ ≤√µ2r

n2

This prevents UV > from being too peaky• µ1 ≤ µ2 ≤ µ2

1r, since

|(UV >)ij | = |e>i UV >ej | ≤ ‖e>

i U‖2 · ‖V >ej‖2 ≤µ1r

n

‖UV >‖2∞ ≥

‖UV >ej‖2F

n= ‖V

>ej‖22

n= µ1r

n2 (suppose ‖V >ej‖22 = µ1r

n)

Robust PCA 14-13

Page 16: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Convex relaxation

minimizeL,S rank(L) + λ‖S‖0 s.t. M = L + S (14.1)

minimizeL,S ‖L‖∗ + λ‖S‖1 s.t. M = L + S (14.2)

• ‖ · ‖∗: nuclear norm; ‖ · ‖1: entry-wise `1 norm• λ > 0: regularization parameter that balances two terms

Robust PCA 14-14

Page 17: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Theoretical guarantee

Theorem 14.4 (Candes, Li, Ma, Wright ’11)

• rank(L) . nmaxµ1,µ2 log2 n

;

• Nonzero entries of S are randomly located, and ‖S‖0 ≤ ρsn2 forsome constant ρs > 0 (e.g. ρs = 0.2).

Then (14.2) with λ = 1/√n is exact with high prob.

• rank(L) can be quite high (up to n/polylog(n))

• Parameter free: λ = 1/√n

• Ability to correct gross error: ‖S‖0 n2

• Sparse component S can have arbitrary magnitudes / signs!

Robust PCA 14-15

Page 18: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Geometry

Fig. credit: Candes ’14

Robust PCA 14-16

Page 19: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Empirical success rate

rank(L)/n s

1

rank

(L)/

n

s

1

n = 400

Fig. credit: Candes, Li, Ma, Wright ’11

Robust PCA 14-17

Page 20: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Dense error correction

Theorem 14.5 (Ganesh, Wright, Li, Candes, Ma ’10, Chen,Jalali, Sanghavi, Caramanis ’13)

• rank(L) . nmaxµ1,µ2 log2 n

;• Nonzero entries of S are randomly located, have random sign,

and ‖S‖0 = ρsn2.

Then (14.2) with λ √

1−ρs

ρsnsucceeds with high prob., provided that

1− ρs︸ ︷︷ ︸non-corruption rate

&

√maxµ1, µ2rpolylog(n)

n

• When additive corruptions have random signs, (14.2) works evenwhen a dominant fraction of the entries are corrupted

Robust PCA 14-18

Page 21: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Is joint coherence needed?

• Matrix completion: does not need µ2

• Robust PCA: so far we need µ2

Question: is µ2 needed? can we recover L with rank up ton

µ1polylog(n) (rather than nmaxµ1,µ2polylog(n))?

Answer: no (example: planted clique)

Robust PCA 14-19

Page 22: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Planted clique problem

Setup: a graph G of n nodes generated as follows

1. connect each pair of nodes independently with prob. 0.5

2. pick n0 nodes and make them a clique (fully connected)

Goal: find the hidden clique from G

Information theoretically, one can recover the clique if n0 > 2 log2 n

Robust PCA 14-20

Page 23: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Conjecture on computational barrier

Conjecture: ∀ constant ε > 0, if n0 ≤ n0.5−ε, then no tractablealgorithm can find the clique from G with prob. 1− o(1)

— often used as a hardness assumption

Lemma 14.6

If there is an algorithm that allows recovery of any L from M withrank(L) ≤ n

µ1polylog(n) , then the above conjecture is violated

Robust PCA 14-21

Page 24: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Proof of Lemma 14.6

Suppose L is the true adjacency matrix,

Li,j =

1, if i, j are both in the clique0, else

Let A be the adjacency matrix of G, and generate M s.t.

Mi,j =Ai,j , with prob. 2/30, else

Therefore, one can write

M = L + M −L︸ ︷︷ ︸each entry is nonzero w.p. 1/3

Robust PCA 14-22

Page 25: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Proof of Lemma 14.6

Note thatµ1 = n

n0and µ2 = n2

n20

If there is an algorithm that can recover any L of rank nµ1polylog(n)

from M , then

rank(L) = 1 ≤ n

µ1polylog(n) ⇐⇒ n0 ≥ polylog(n)

But this contradicts the conjecture (which claims computationalinfeasibility to recover L unless n0 ≥ n0.5−o(1))

Robust PCA 14-23

Page 26: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Matrix completion with corruptions

What if we have missing data + corruptions?• Observed entries

Mij = Lij + Sij , (i, j) ∈ Ω

for some observation set Ω, where S = (Sij) is sparse

• A natural extension of RPCA

minimizeL,S ‖L‖∗ + λ‖S‖1 s.t. PΩ(M) = PΩ(L + S)

• Theorems 14.4 - 14.5 easily extend to this setting

Robust PCA 14-24

Page 27: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Efficient algorithmIn the presence of noise, one needs to solve

minimizeL,S ‖L‖∗ + λ‖S‖1 + µ

2 ‖M −L− S‖2F

which can be solved efficiently

Algorithm 14.1 Iterative soft-thresholdingfor t = 0, 1, · · · :

Lt+1 = T1/µ(M − St

)St+1 = ψλ/µ

(M −Lt+1

)where T : singular-value thresholding operator; ψ: soft thresholdingoperator

Robust PCA 14-25

Page 28: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Reference

• “Lecture notes, Advanced topics in signal processing (ECE 8201),”Y. Chi, 2015.

• “Robust principal component analysis?,” E. Candes, X. Li, Y. Ma, andJ. Wright, Journal of ACM, 2011.

• “Rank-sparsity incoherence for matrix decomposition,”V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky, SIAMJournal on Optimization, 2011.

• “Latent variable graphical model selection via convex optimization,”V. Chandrasekaran, P. Parrilo, and A. Willsky, Annals of Statistics,2012.

• “Incoherence-optimal matrix completion,” Y. Chen, IEEE Transactionson Information Theory, 2015.

• “Dense error correction for low-rank matrices via principal componentpursuit,” A. Ganesh, J. Wright, X. Li, E. Candes, Y. Ma, ISIT, 2010.

Robust PCA 14-26

Page 29: Robust Principal Component Analysisyc5/ele520_math_data/lectures/robust_PCA.pdf1 Robust PCA 14-12. Low-rank component: joint coherence Definition 14.3 (Joint coherence) Joint coherence

Reference

• “Low-rank matrix recovery from errors and erasures,” Y. Chen,A. Jalali, S. Sanghavi, C. Caramanis, IEEE Transactions on InformationTheory, 2013.

Robust PCA 14-27