robust pca - national university of singaporecs5240/lecture/robust-pca.pdf · robust pca...

52
Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52

Upload: others

Post on 01-Jun-2020

24 views

Category:

Documents


5 download

TRANSCRIPT

Robust PCA

CS5240 Theoretical Foundations in Multimedia

Leow Wee Kheng

Department of Computer Science

School of Computing

National University of Singapore

Leow Wee Kheng (NUS) Robust PCA 1 / 52

Previously...

Previously...

not robust against outliers robust against outliers

linear least squares trimmed least squares

PCA trimmed PCA

Various ways to robustify PCA:

◮ trimming: remove outliers

◮ covariance matrix with 0-1 weight [Xu95]: similar to trimming

◮ weighted SVD [Gabriel79]: weighting

◮ robust error function [Torre2001]: winsorizing

Strength: simple conceptsWeakness: no guarantee of optimality

Leow Wee Kheng (NUS) Robust PCA 2 / 52

Robust PCA

Robust PCA

PCA can be formulated as follows:

Given a data matrix D, recover a low-rank matrix A from Dsuch that the error E = D−A is minimized:

minA,E

‖E‖F , subject to rank(A) ≤ r, D = A+E. (1)

◮ r ≪ min(m,n) is the target rank of A.

◮ ‖ · ‖F is the Frobenius norm.

Notes:

◮ This definition of PCA includes dimensionality reduction.

◮ PCA is severely affected by large-amplitude noise; not robust.

Leow Wee Kheng (NUS) Robust PCA 3 / 52

Robust PCA

[Wright2009] formulated the Robust PCA problem as follows:

Given a data matrix D = A+E where A and E are unknown

but A is low-rank and E is sparse, recover A.

An obvious way to state the robust PCA problem in math is:

Given a data matrix D, find A and E that solve the problem

minA,E

rank(A) + λ‖E‖0, subject to A+E = D. (2)

◮ λ is a Lagrange multiplier.

◮ ‖E‖0: l0-norm, number of non-zero elements in E.E is sparse if ‖E‖0 is small.

Leow Wee Kheng (NUS) Robust PCA 4 / 52

Robust PCA

minA,E

rank(A) + λ‖E‖0, subject to A+E = D. (2)

D A E

◮ Problem 2 is a matrix recovery problem.

◮ rank(A) and ‖E‖0 are not continuous, not convex;very hard to solve; no efficient algorithm.

Leow Wee Kheng (NUS) Robust PCA 5 / 52

Robust PCA

[Candes2011, Wright2009] reformulated Problem 2 as follows:

Given an m×n data matrix D, find A and E that solve

minA,E

‖A‖∗ + λ‖E‖1, subject to A+E = D. (3)

◮ ‖A‖∗: nuclear norm, sum of singular values of A;surrogate for rank(A).

◮ ‖E‖1: l1-norm, sum of absolute values of elements of E;surrogate for ‖E‖0.

Solution of Problem 3 can be recovered exactly if

◮ A is sufficiently low-rank but not sparse, and

◮ E is sufficiently sparse but not low-rank,

with optimal λ = 1/√max(m,n).

◮ ‖A‖∗ and ‖E‖1 are convex; can apply convex optimization.

Leow Wee Kheng (NUS) Robust PCA 6 / 52

Robust PCA Introduction to Convex Optimization

Introduction to Convex Optimization

For a differentiable function f(x), its minimizer x is given by

df(x)

dx=

[∂f(x)

∂x1· · ·

∂f(x)

∂xm

]⊤= 0. (4)

/ = 0df dxx

x

f

Leow Wee Kheng (NUS) Robust PCA 7 / 52

Robust PCA Introduction to Convex Optimization

‖E‖1 is not differentiable when any of its element is zero!

−4−2

02

4

x1−4−2

02

4

x2

0

2

4

6

8

10

12

f(x)

Cannot write the following because they don’t exist:

d‖E‖1dE

,∂‖E‖1∂eij

,d|eij |

deij. WRONG! (5)

Fortunately, ‖E‖1 is convex.

Leow Wee Kheng (NUS) Robust PCA 8 / 52

Robust PCA Introduction to Convex Optimization

A function f(x) is convex if and only if ∀x1,x2, ∀α ∈ [0, 1],

f(αx1 + (1− α)x2) ≤ αf(x1) + (1− α)f(x2). (6)

f x( )

x1

x2

x

x

Leow Wee Kheng (NUS) Robust PCA 9 / 52

Robust PCA Introduction to Convex Optimization

A vector g(x) is a subgradient of convex function f at x if

f(y)− f(x) ≥ g(x)⊤(y − x), ∀y. (7)

f x( )

2l

3l

x2x1

1l

x

◮ At differentiable point x1: one unique subgradient = gradient.

◮ At non-differentiable point x2: multiple subgradients.

Leow Wee Kheng (NUS) Robust PCA 10 / 52

Robust PCA Introduction to Convex Optimization

The subdifferential ∂f(x) is the set of all subgradients of f at x:

∂f(x) ={g(x) |g(x)⊤(y − x) ≤ f(y)− f(x), ∀y

}. (8)

Caution:

◮ Subdifferential ∂f(x) is not partial differentiation ∂f(x)/∂x.Don’t mix up.

◮ Partial differentiation is defined for differentiable functions.It is a single vector.

◮ Subdifferential is defined for convex functions, which may benon-differentiable. It is a set of vectors.

Leow Wee Kheng (NUS) Robust PCA 11 / 52

Robust PCA Introduction to Convex Optimization

Example: Absolute value.

|x| is not differentiable at x = 0 but subdifferentiable at x = 0:

|x| =

x if x > 0,

0 if x = 0,

−x if x < 0.

∂|x| =

{+1} if x > 0,

[−1,+1] if x = 0,

{−1} if x < 0.

(9)

g = −1

g = 0

g = +1

x| |

0 < < 1g

x0

x

1

−1

0

absolute value subdifferential

Leow Wee Kheng (NUS) Robust PCA 12 / 52

Robust PCA Introduction to Convex Optimization

Example: l1-norm of an m-D vector x.

‖x‖1 =m∑

i=1

|xi|, ∂‖x‖1 =m∑

i=1

∂|xi|. (10)

Caution: Right-hand-side of Eq. 10 is set addition.This gives a product of m sets, one for each element xi:

∂‖x‖1 = J1 × · · · × Jm, Ji =

{+1} if xi > 0,

[−1,+1] if xi = 0,

{−1} if xi < 0.

(11)

Alternatively, we can write ∂‖x‖1 as

∂‖x‖1 = {g } such that gi

= +1 if xi > 0,

∈ [−1,+1] if xi = 0,

= −1 if xi < 0.

(12)

Leow Wee Kheng (NUS) Robust PCA 13 / 52

Robust PCA Introduction to Convex Optimization

Another alternative:

∂‖x‖1 = {g } such that

{gi = sgn(gi) if |xi| > 0,

|gi| ≤ 1 if xi = 0.(13)

Here are some examples of the subdifferentials of 2D l1-norm.

∂f(0, 0) = [−1, 1]× [−1, 1] ∂f(1, 0) = {1} × [−1, 1] ∂f(1, 1) = {(1, 1)}

Leow Wee Kheng (NUS) Robust PCA 14 / 52

Robust PCA Introduction to Convex Optimization

Optimality Condition

x is a minimum of f if 0 ∈ ∂f(x), i.e., 0 is a subgradient of f .

f x( )

g x( ) = 0

xx

For more details on convex functions and convex optimization, refer to[Bertsekas2003, Boyd2004, Rockafellar70, Shor85].

Leow Wee Kheng (NUS) Robust PCA 15 / 52

Robust PCA Back to Robust PCA

Back to Robust PCA

Shrinkage or soft threshold operator:

Tε(x) =

x− ε if x > ε,

x+ ε if x < −ε,

0 otherwise.

(14)

− ε

T x( )ε

xε0

Leow Wee Kheng (NUS) Robust PCA 16 / 52

Robust PCA Back to Robust PCA

Using convex optimization, the following minimizers are shown[Cai2010, Hale2007]:

For an m×n matrix M with SVD USV⊤,

UTε(S)V⊤ = argmin

X

ε‖X‖∗ +1

2‖X−M‖2F , (15)

Tε(M) = argminX

ε‖X‖1 +1

2‖X−M‖2F . (16)

Leow Wee Kheng (NUS) Robust PCA 17 / 52

Robust PCA Solving Robust PCA

Solving Robust PCA

There are several ways to solve robust PCA (Problem 3)[Lin2009,Wright2009]:

◮ principal component pursuit

◮ iterative thresholding

◮ accelerated proximal gradient

◮ augmented Lagrange multipliers

Leow Wee Kheng (NUS) Robust PCA 18 / 52

Augmented Lagrange Multipliers

Augmented Lagrange Multipliers

Consider the problem

minx

f(x) subject to cj(x) = 0, j = 1, . . . ,m. (17)

This is a constrained optimization problem.

Lagrange multipliers method reformulates Problem 17 as:

minx

f(x) +m∑

j=1

λjcj(x) (18)

with some constants λj .

Leow Wee Kheng (NUS) Robust PCA 19 / 52

Augmented Lagrange Multipliers

Penalty method reformulates Problem 17 as:

minx

f(x) + µ

m∑

j=1

c2j (x). (19)

◮ Parameter µ increases over iteration, e.g., by factor of 10.◮ Need µ → ∞ to get good solution.

Augmented Lagrange multipliers method combinesLagrange multipliers and penalty:

minx

f(x) +m∑

j=1

λjcj(x) +µ

2

m∑

j=1

c2j (x). (20)

Denote λ = [λ1 · · · λm]⊤, c(x) = [c1(x) · · · cm(x)]⊤.Then, Problem 20 becomes

minx

f(x) + λ⊤c(x) +

µ

2‖c(x)‖2. (21)

Leow Wee Kheng (NUS) Robust PCA 20 / 52

Augmented Lagrange Multipliers

If the constraints form a matrix C = [cjk], then define Λ = [λjk].

Then Problem 20 becomes

minx

f(x) + 〈Λ,C(x)〉+µ

2‖C(x)‖2F . (22)

◮ 〈Λ,C〉 is sum of product of corresponding elements:

〈Λ,C〉 =∑

j

k

λjkcjk.

◮ ‖C‖F is the Frobenius norm:

‖C‖2F =∑

j

k

c2jk.

Leow Wee Kheng (NUS) Robust PCA 21 / 52

Augmented Lagrange Multipliers

Augmented Lagrange Multipliers (ALM) Method

1. Initialize Λ, µ > 0, ρ ≥ 1.

2. Repeat until convergence:

2.1 Compute x = argminx L(x) where

L(x) = f(x) + 〈Λ,C(x)〉+µ

2‖C(x)‖2F .

2.2 Update Λ = Λ+ µC(x).

2.3 Update µ = ρµ.

What kind of optimization algorithm is this?

ALM does not need µ → ∞ to get good solution.That means, can converge faster.

Leow Wee Kheng (NUS) Robust PCA 22 / 52

Augmented Lagrange Multipliers

With ALM, robust PCA (Problem 3) is reformulated as

minA,E

‖A‖∗ + λ‖E‖1 + 〈Y,D−A−E〉+µ

2‖D−A−E‖2F , (23)

◮ Y are the Lagrange multipliers.

◮ Constraints C = D−A−E.

To implement Step 2.1 of ALM, need to find minima for A and E.Adopt alternating optimization.

Leow Wee Kheng (NUS) Robust PCA 23 / 52

Augmented Lagrange Multipliers Trace of Matrix

Trace of Matrix

The trace of a matrix A is the sum of its diagonal elements aii:

tr(A) =n∑

i=1

aii. (24)

Strictly speaking, scalar c and a 1×1 matrix [ c ] are not the same thing.Nevertheless, since tr([ c ]) = c, we often write, for simplicity tr(c) = c.

Properties

◮ tr(A) =n∑

i=1

λi, where λi are the eigenvalues of A.

◮ tr(A+B) = tr(A) + tr(B)

◮ tr(cA) = c tr(A)

◮ tr(A) = tr(A⊤)

Leow Wee Kheng (NUS) Robust PCA 24 / 52

Augmented Lagrange Multipliers Trace of Matrix

◮ tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC)

◮ tr(X⊤Y) = tr(XY⊤) = tr(Y⊤X) = tr(YX⊤) =∑

i

j

xijyij

Leow Wee Kheng (NUS) Robust PCA 25 / 52

Augmented Lagrange Multipliers

From Problem 23, the minimal E with other variables fixed is given by

minE

λ‖E‖1 + 〈Y,D−A−E〉+µ

2‖D−A−E‖2F

⇒ minE

λ‖E‖1 + tr(Y⊤(D−A−E)) +µ

2‖D−A−E‖2F

⇒ minE

λ‖E‖1 + tr(−Y⊤E) +µ

2tr((D−A−E)⊤(D−A−E))

... (Homework)

⇒ minE

λ‖E‖1 +µ

2tr((E− (D−A+Y/µ))⊤(E− (D−A+Y/µ)))

⇒ minE

λ‖E‖1 +µ

2‖E− (D−A+Y/µ)‖2F

⇒ minE

λ

µ‖E‖1 +

1

2‖E− (D−A+Y/µ)‖2F . (25)

Leow Wee Kheng (NUS) Robust PCA 26 / 52

Augmented Lagrange Multipliers

Compare Eq. 16

Tε(M) = argminX

ε‖X‖1 +1

2‖X−M‖2F ,

and Problem 25

minE

λ

µ‖E‖1 +

1

2‖E− (D−A+Y/µ)‖2F .

Set X = E, M = D−A+Y/µ. Then,

E = Tλ/µ(D−A+Y/µ). (26)

Leow Wee Kheng (NUS) Robust PCA 27 / 52

Augmented Lagrange Multipliers

From Problem 23, the minimal A with other variables fixed is given by

minA

‖A‖∗ + 〈Y,D−A−E〉+µ

2‖D−A−E‖2F

⇒ minA

+ tr(Y⊤(D−A−E)) +µ

2‖D−A−E‖2F

... (Homework)

⇒ minA

1

µ‖A‖∗ +

1

2‖A− (D−E+Y/µ)‖2F (27)

Compare Problem 27 and Eq. 15

UTε(S)V⊤ = argmin

X

ε‖X‖∗ +1

2‖X−M‖2F ,

Set X = A, M = D−E+Y/µ. Then,

A = UT1/µ(S)V⊤. (28)

Leow Wee Kheng (NUS) Robust PCA 28 / 52

Augmented Lagrange Multipliers

Robust PCA for Matrix Recovery via ALMInputs: D.

1. Initialize A = 0, E = 0.

2. Initialize Y, µ > 0, ρ > 1.

3. Repeat until convergence:

4. Repeat until convergence:

5. U,S,V = SVD(D−E+Y/µ).

6. A = UT1/µ(S)V⊤.

7. E = Tλ/µ(D−A+Y/µ).

8. Update Y = Y + µ(D−A−E).

9. Update µ = ρµ.

Outputs: A, E.

Leow Wee Kheng (NUS) Robust PCA 29 / 52

Augmented Lagrange Multipliers

Typical initialization [Lin2009]:

◮ Y = sgn(D)/J(sgn(D)).

◮ sgn(D) gives sign of each matrix element of D.

◮ J(·) gives scaling factors:

J(X) = max(‖X‖2, λ−1‖X‖∞).

◮ ‖X‖2 is spectral norm, largest singular value of X.

◮ ‖X‖∞ is largest absolute value of elements of X.

◮ µ = 1.25 ‖D‖2.

◮ ρ = 1.5.

◮ λ = 1/√max(m,n) for m×n matrix D.

Leow Wee Kheng (NUS) Robust PCA 30 / 52

Augmented Lagrange Multipliers

Example: Recovery of video background.

video frames

data matrix D

Leow Wee Kheng (NUS) Robust PCA 31 / 52

Augmented Lagrange Multipliers

Sample results:

data matrix low-rank error low-rank error

Leow Wee Kheng (NUS) Robust PCA 32 / 52

Augmented Lagrange Multipliers

Example: Removal of specular reflection and shadow.

D A E

Leow Wee Kheng (NUS) Robust PCA 33 / 52

Fixed-Rank Robust PCA

Fixed-Rank Robust PCA

In reflection removal, reflection may be global.

ground-truth input

Then, E is not sparse: violate RPCA condition!But, rank of A = 1.

Fix the rank of A to deal with non-sparse E [Leow2013].

Leow Wee Kheng (NUS) Robust PCA 34 / 52

Fixed-Rank Robust PCA

Fixed-Rank Robust PCA via ALMInputs: D.

1. Initialize A = 0, E = 0.

2. Initialize Y, µ > 0, ρ > 1.

3. Repeat until convergence:

4. Repeat until convergence:

5. U,S,V = SVD(D−E+Y/µ).

6. If rank(T1/µ(S)) < r, A = UT1/µ(S)V⊤; else A = USrV

⊤.

7. E = Tλ/µ(D−A+Y/µ).

8. Update Y = Y + µ(D−A−E).

9. Update µ = ρµ.

Outputs: A, E.

Sr is S with last m− r singular values set to 0.

Leow Wee Kheng (NUS) Robust PCA 35 / 52

Fixed-Rank Robust PCA

Example: Removal of local reflections.

ground-truth input

FRPCA RPCA

Leow Wee Kheng (NUS) Robust PCA 36 / 52

Fixed-Rank Robust PCA

Example: Removal of global reflections.

ground-truth input

FRPCA RPCA

Leow Wee Kheng (NUS) Robust PCA 37 / 52

Fixed-Rank Robust PCA

Example: Removal of global reflections.

ground-truth input

FRPCA RPCA

Leow Wee Kheng (NUS) Robust PCA 38 / 52

Fixed-Rank Robust PCA

Example: Background recovery for traffic video: fast moving vehicles.

input

FRPCA RPCA

Leow Wee Kheng (NUS) Robust PCA 39 / 52

Fixed-Rank Robust PCA

Example: Background recovery for traffic video: slow moving vehicles.

input

FRPCA RPCA

Leow Wee Kheng (NUS) Robust PCA 40 / 52

Fixed-Rank Robust PCA

Example: Background recovery for traffic video: temporary stop.

input

FRPCA RPCA

Leow Wee Kheng (NUS) Robust PCA 41 / 52

Matrix Completion

Matrix Completion

Customers are asked to rate the movies from 1 (poor) to 5 (excellent).

A 5 5 3 2B 3 4 3C 3 4 5 4D 4 5 5 3...

Customers rate only some movies ⇒ some data are missing.How to estimate the missing data? matrix completion.

Leow Wee Kheng (NUS) Robust PCA 42 / 52

Matrix Completion

Let D denote data matrix with missing elements set to 0, andM = {(i, j)} denote the indices of missing elements in D.

Then, the matrix completion problem can be formulated as

Given D and M , find matrix A that solves the problem

minA

‖A‖∗ subject to A+E = D, Eij = 0 ∀ (i, j) /∈ M. (29)

◮ For (i, j) /∈ M , constrain Eij = 0 so that Aij = Dij ; no change.

◮ For (i, j) ∈ M , Dij = 0, i.e., Aij = Eij ; recovered value.

Reformulating Problem 29 using augmented Lagrange multipliers gives

minA

‖A‖∗ + 〈Y,D−A−E〉+µ

2‖D−A−E‖2F (30)

such that Eij = 0 ∀ (i, j) /∈ M .

Leow Wee Kheng (NUS) Robust PCA 43 / 52

Matrix Completion

Robust PCA for Matrix CompletionInputs: D.

1. Initialize A = 0, E = 0.

2. Initialize Y, µ > 0, ρ > 1.

3. Repeat until convergence:

4. Repeat until convergence:

5. U,S,V = SVD(D−E+Y/µ).

6. A = UT1/µ(S)V⊤.

7. E = ΓM (D−A+Y/µ), where

ΓM (X) =

{Xij , for (i, j) ∈ M.0, for (i, j) /∈ M,

8. Update Y = Y + µ(D−A−E).

9. Update µ = ρµ.

Outputs: A, E.

Leow Wee Kheng (NUS) Robust PCA 44 / 52

Matrix Completion

Example: Recovery of occluded parts in face images.

input ground matrix matrixtruth completion recovery

Leow Wee Kheng (NUS) Robust PCA 45 / 52

Summary

Summary

Robustification of PCA

PCA

trimming

0−1 covariance

weighted SVD

robust error function

trimmed PCA

trimmed PCA

weighted PCA

winsorized PCA

Leow Wee Kheng (NUS) Robust PCA 46 / 52

Summary

Robust PCA

robust PCA(matrix recovery)

fixed−rankrobust PCA

robust PCA(matrix completion)

low−rank + sparsematrix decomposition

fix elements of Efix rank of A

PCA

Leow Wee Kheng (NUS) Robust PCA 47 / 52

Probing Questions

Probing Questions

◮ If the data matrix of a problem is composed of a low-rank matrix,a sparse matrix and something else, can you still use robust PCAmethods? If yes, how? If no, why?

◮ In application of robust PCA to high-resolution colour imageprocessing, the data matrix contains three times as many rows asthe number of pixels in the images, which can lead to a very largedata matrix that takes a long time to compute. Suggest a way toovercome this problem.

◮ In application of robust PCA to video processing, the data matrixcontains as many columns as the number of video frames, whichcan lead to a very large data matrix that is more than theavailable memory required to store the matrix. Suggest a way toovercome this problem.

Leow Wee Kheng (NUS) Robust PCA 48 / 52

Homework

Homework

1. Show thattr(X⊤Y) =

i

j

xijyij .

2. Show that the following two optimization problems are equivalent:

minE

λ‖E‖1 + tr(−Y⊤E) +µ

2tr((D−A−E)⊤(D−A−E))

minE

λ‖E‖1 +µ

2tr((E− (D−A+Y/µ))⊤(E− (D−A+Y/µ)))

3. Show that the minimal A of Problem 23 with other variables fixedis given by

minA

1

µ‖A‖∗ +

1

2‖A− (D−E+Y/µ)‖2F .

Leow Wee Kheng (NUS) Robust PCA 49 / 52

References

References I

1. D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific,2003.

2. S. Boyd and L. Vandenberghe. Convex Optimization. CambridgeUniversity Press, 2004.

3. J. F. Cai, E. J. Candes, Z. Shen. A singular value thresholding algorithmfor matrix completion. SIAM Journal of Optimization, 20(4):1956–1982,2010.

4. E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal componentanalysis? Journal of ACM, 58(3), 2011.

5. F. De la Torre and M. J. Black. Robust principal component analysis forcomputer vision. In Proc. Int. Conf. Computer Vision, 2001.

6. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.

Leow Wee Kheng (NUS) Robust PCA 50 / 52

References

References II

7. K. Gabriel and S. Zamir. Lower rank approximation of matrices by leastsquares with any choice of weights. Technometrics, 21:489-498, 1979.

8. E. T. Hale, W. Yin, Y. Zhang. A fixed-point continuation method forl1-regularized minimization with applications to compressed sensing.Tech. Report TR07-07, Dept. of Comp. and Applied Math., RiceUniversity, 2007.

9. W. K. Leow, Y. Cheng, L. Zhang, T. Sim, and L. Foo. Backgroundrecovery by fixed-rank robust principal component analysis. In Proc.Int. Conf. on Computer Analysis of Images and Patterns, 2013.

10. Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrange multipliermethod for exact recovery of corrupted low-rank matrices. Tech. Report

UILU-ENG-09-2215, UIUC, 2009.

11. N. Z. Shor. Minimization Methods for Non-Differentiable Functions.Springer-Verlag, 1985.

Leow Wee Kheng (NUS) Robust PCA 51 / 52

References

References III

12. J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao. Robust principalcomponent analysis: Exact recovery of corrupted low-rank matrices byconvex optimization. In Proc. Conf. on Neural Information Processing

Systems, 2009.

13. L. Xu and A. Yuille. Robust principal component analysis byself-organizing rules based on statistical physics approach. IEEE Trans.

Neural Networks, 6(1):131-143, 1995.

Leow Wee Kheng (NUS) Robust PCA 52 / 52