robust pca - national university of singaporecs5240/lecture/robust-pca.pdf · robust pca...
TRANSCRIPT
Robust PCA
CS5240 Theoretical Foundations in Multimedia
Leow Wee Kheng
Department of Computer Science
School of Computing
National University of Singapore
Leow Wee Kheng (NUS) Robust PCA 1 / 52
Previously...
Previously...
not robust against outliers robust against outliers
linear least squares trimmed least squares
PCA trimmed PCA
Various ways to robustify PCA:
◮ trimming: remove outliers
◮ covariance matrix with 0-1 weight [Xu95]: similar to trimming
◮ weighted SVD [Gabriel79]: weighting
◮ robust error function [Torre2001]: winsorizing
Strength: simple conceptsWeakness: no guarantee of optimality
Leow Wee Kheng (NUS) Robust PCA 2 / 52
Robust PCA
Robust PCA
PCA can be formulated as follows:
Given a data matrix D, recover a low-rank matrix A from Dsuch that the error E = D−A is minimized:
minA,E
‖E‖F , subject to rank(A) ≤ r, D = A+E. (1)
◮ r ≪ min(m,n) is the target rank of A.
◮ ‖ · ‖F is the Frobenius norm.
Notes:
◮ This definition of PCA includes dimensionality reduction.
◮ PCA is severely affected by large-amplitude noise; not robust.
Leow Wee Kheng (NUS) Robust PCA 3 / 52
Robust PCA
[Wright2009] formulated the Robust PCA problem as follows:
Given a data matrix D = A+E where A and E are unknown
but A is low-rank and E is sparse, recover A.
An obvious way to state the robust PCA problem in math is:
Given a data matrix D, find A and E that solve the problem
minA,E
rank(A) + λ‖E‖0, subject to A+E = D. (2)
◮ λ is a Lagrange multiplier.
◮ ‖E‖0: l0-norm, number of non-zero elements in E.E is sparse if ‖E‖0 is small.
Leow Wee Kheng (NUS) Robust PCA 4 / 52
Robust PCA
minA,E
rank(A) + λ‖E‖0, subject to A+E = D. (2)
D A E
◮ Problem 2 is a matrix recovery problem.
◮ rank(A) and ‖E‖0 are not continuous, not convex;very hard to solve; no efficient algorithm.
Leow Wee Kheng (NUS) Robust PCA 5 / 52
Robust PCA
[Candes2011, Wright2009] reformulated Problem 2 as follows:
Given an m×n data matrix D, find A and E that solve
minA,E
‖A‖∗ + λ‖E‖1, subject to A+E = D. (3)
◮ ‖A‖∗: nuclear norm, sum of singular values of A;surrogate for rank(A).
◮ ‖E‖1: l1-norm, sum of absolute values of elements of E;surrogate for ‖E‖0.
Solution of Problem 3 can be recovered exactly if
◮ A is sufficiently low-rank but not sparse, and
◮ E is sufficiently sparse but not low-rank,
with optimal λ = 1/√max(m,n).
◮ ‖A‖∗ and ‖E‖1 are convex; can apply convex optimization.
Leow Wee Kheng (NUS) Robust PCA 6 / 52
Robust PCA Introduction to Convex Optimization
Introduction to Convex Optimization
For a differentiable function f(x), its minimizer x is given by
df(x)
dx=
[∂f(x)
∂x1· · ·
∂f(x)
∂xm
]⊤= 0. (4)
/ = 0df dxx
x
f
Leow Wee Kheng (NUS) Robust PCA 7 / 52
Robust PCA Introduction to Convex Optimization
‖E‖1 is not differentiable when any of its element is zero!
−4−2
02
4
x1−4−2
02
4
x2
0
2
4
6
8
10
12
f(x)
Cannot write the following because they don’t exist:
d‖E‖1dE
,∂‖E‖1∂eij
,d|eij |
deij. WRONG! (5)
Fortunately, ‖E‖1 is convex.
Leow Wee Kheng (NUS) Robust PCA 8 / 52
Robust PCA Introduction to Convex Optimization
A function f(x) is convex if and only if ∀x1,x2, ∀α ∈ [0, 1],
f(αx1 + (1− α)x2) ≤ αf(x1) + (1− α)f(x2). (6)
f x( )
x1
x2
x
x
Leow Wee Kheng (NUS) Robust PCA 9 / 52
Robust PCA Introduction to Convex Optimization
A vector g(x) is a subgradient of convex function f at x if
f(y)− f(x) ≥ g(x)⊤(y − x), ∀y. (7)
f x( )
2l
3l
x2x1
1l
x
◮ At differentiable point x1: one unique subgradient = gradient.
◮ At non-differentiable point x2: multiple subgradients.
Leow Wee Kheng (NUS) Robust PCA 10 / 52
Robust PCA Introduction to Convex Optimization
The subdifferential ∂f(x) is the set of all subgradients of f at x:
∂f(x) ={g(x) |g(x)⊤(y − x) ≤ f(y)− f(x), ∀y
}. (8)
Caution:
◮ Subdifferential ∂f(x) is not partial differentiation ∂f(x)/∂x.Don’t mix up.
◮ Partial differentiation is defined for differentiable functions.It is a single vector.
◮ Subdifferential is defined for convex functions, which may benon-differentiable. It is a set of vectors.
Leow Wee Kheng (NUS) Robust PCA 11 / 52
Robust PCA Introduction to Convex Optimization
Example: Absolute value.
|x| is not differentiable at x = 0 but subdifferentiable at x = 0:
|x| =
x if x > 0,
0 if x = 0,
−x if x < 0.
∂|x| =
{+1} if x > 0,
[−1,+1] if x = 0,
{−1} if x < 0.
(9)
g = −1
g = 0
g = +1
x| |
0 < < 1g
x0
x
1
−1
0
absolute value subdifferential
Leow Wee Kheng (NUS) Robust PCA 12 / 52
Robust PCA Introduction to Convex Optimization
Example: l1-norm of an m-D vector x.
‖x‖1 =m∑
i=1
|xi|, ∂‖x‖1 =m∑
i=1
∂|xi|. (10)
Caution: Right-hand-side of Eq. 10 is set addition.This gives a product of m sets, one for each element xi:
∂‖x‖1 = J1 × · · · × Jm, Ji =
{+1} if xi > 0,
[−1,+1] if xi = 0,
{−1} if xi < 0.
(11)
Alternatively, we can write ∂‖x‖1 as
∂‖x‖1 = {g } such that gi
= +1 if xi > 0,
∈ [−1,+1] if xi = 0,
= −1 if xi < 0.
(12)
Leow Wee Kheng (NUS) Robust PCA 13 / 52
Robust PCA Introduction to Convex Optimization
Another alternative:
∂‖x‖1 = {g } such that
{gi = sgn(gi) if |xi| > 0,
|gi| ≤ 1 if xi = 0.(13)
Here are some examples of the subdifferentials of 2D l1-norm.
∂f(0, 0) = [−1, 1]× [−1, 1] ∂f(1, 0) = {1} × [−1, 1] ∂f(1, 1) = {(1, 1)}
Leow Wee Kheng (NUS) Robust PCA 14 / 52
Robust PCA Introduction to Convex Optimization
Optimality Condition
x is a minimum of f if 0 ∈ ∂f(x), i.e., 0 is a subgradient of f .
f x( )
g x( ) = 0
xx
For more details on convex functions and convex optimization, refer to[Bertsekas2003, Boyd2004, Rockafellar70, Shor85].
Leow Wee Kheng (NUS) Robust PCA 15 / 52
Robust PCA Back to Robust PCA
Back to Robust PCA
Shrinkage or soft threshold operator:
Tε(x) =
x− ε if x > ε,
x+ ε if x < −ε,
0 otherwise.
(14)
− ε
T x( )ε
xε0
Leow Wee Kheng (NUS) Robust PCA 16 / 52
Robust PCA Back to Robust PCA
Using convex optimization, the following minimizers are shown[Cai2010, Hale2007]:
For an m×n matrix M with SVD USV⊤,
UTε(S)V⊤ = argmin
X
ε‖X‖∗ +1
2‖X−M‖2F , (15)
Tε(M) = argminX
ε‖X‖1 +1
2‖X−M‖2F . (16)
Leow Wee Kheng (NUS) Robust PCA 17 / 52
Robust PCA Solving Robust PCA
Solving Robust PCA
There are several ways to solve robust PCA (Problem 3)[Lin2009,Wright2009]:
◮ principal component pursuit
◮ iterative thresholding
◮ accelerated proximal gradient
◮ augmented Lagrange multipliers
Leow Wee Kheng (NUS) Robust PCA 18 / 52
Augmented Lagrange Multipliers
Augmented Lagrange Multipliers
Consider the problem
minx
f(x) subject to cj(x) = 0, j = 1, . . . ,m. (17)
This is a constrained optimization problem.
Lagrange multipliers method reformulates Problem 17 as:
minx
f(x) +m∑
j=1
λjcj(x) (18)
with some constants λj .
Leow Wee Kheng (NUS) Robust PCA 19 / 52
Augmented Lagrange Multipliers
Penalty method reformulates Problem 17 as:
minx
f(x) + µ
m∑
j=1
c2j (x). (19)
◮ Parameter µ increases over iteration, e.g., by factor of 10.◮ Need µ → ∞ to get good solution.
Augmented Lagrange multipliers method combinesLagrange multipliers and penalty:
minx
f(x) +m∑
j=1
λjcj(x) +µ
2
m∑
j=1
c2j (x). (20)
Denote λ = [λ1 · · · λm]⊤, c(x) = [c1(x) · · · cm(x)]⊤.Then, Problem 20 becomes
minx
f(x) + λ⊤c(x) +
µ
2‖c(x)‖2. (21)
Leow Wee Kheng (NUS) Robust PCA 20 / 52
Augmented Lagrange Multipliers
If the constraints form a matrix C = [cjk], then define Λ = [λjk].
Then Problem 20 becomes
minx
f(x) + 〈Λ,C(x)〉+µ
2‖C(x)‖2F . (22)
◮ 〈Λ,C〉 is sum of product of corresponding elements:
〈Λ,C〉 =∑
j
∑
k
λjkcjk.
◮ ‖C‖F is the Frobenius norm:
‖C‖2F =∑
j
∑
k
c2jk.
Leow Wee Kheng (NUS) Robust PCA 21 / 52
Augmented Lagrange Multipliers
Augmented Lagrange Multipliers (ALM) Method
1. Initialize Λ, µ > 0, ρ ≥ 1.
2. Repeat until convergence:
2.1 Compute x = argminx L(x) where
L(x) = f(x) + 〈Λ,C(x)〉+µ
2‖C(x)‖2F .
2.2 Update Λ = Λ+ µC(x).
2.3 Update µ = ρµ.
What kind of optimization algorithm is this?
ALM does not need µ → ∞ to get good solution.That means, can converge faster.
Leow Wee Kheng (NUS) Robust PCA 22 / 52
Augmented Lagrange Multipliers
With ALM, robust PCA (Problem 3) is reformulated as
minA,E
‖A‖∗ + λ‖E‖1 + 〈Y,D−A−E〉+µ
2‖D−A−E‖2F , (23)
◮ Y are the Lagrange multipliers.
◮ Constraints C = D−A−E.
To implement Step 2.1 of ALM, need to find minima for A and E.Adopt alternating optimization.
Leow Wee Kheng (NUS) Robust PCA 23 / 52
Augmented Lagrange Multipliers Trace of Matrix
Trace of Matrix
The trace of a matrix A is the sum of its diagonal elements aii:
tr(A) =n∑
i=1
aii. (24)
Strictly speaking, scalar c and a 1×1 matrix [ c ] are not the same thing.Nevertheless, since tr([ c ]) = c, we often write, for simplicity tr(c) = c.
Properties
◮ tr(A) =n∑
i=1
λi, where λi are the eigenvalues of A.
◮ tr(A+B) = tr(A) + tr(B)
◮ tr(cA) = c tr(A)
◮ tr(A) = tr(A⊤)
Leow Wee Kheng (NUS) Robust PCA 24 / 52
Augmented Lagrange Multipliers Trace of Matrix
◮ tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC)
◮ tr(X⊤Y) = tr(XY⊤) = tr(Y⊤X) = tr(YX⊤) =∑
i
∑
j
xijyij
Leow Wee Kheng (NUS) Robust PCA 25 / 52
Augmented Lagrange Multipliers
From Problem 23, the minimal E with other variables fixed is given by
minE
λ‖E‖1 + 〈Y,D−A−E〉+µ
2‖D−A−E‖2F
⇒ minE
λ‖E‖1 + tr(Y⊤(D−A−E)) +µ
2‖D−A−E‖2F
⇒ minE
λ‖E‖1 + tr(−Y⊤E) +µ
2tr((D−A−E)⊤(D−A−E))
... (Homework)
⇒ minE
λ‖E‖1 +µ
2tr((E− (D−A+Y/µ))⊤(E− (D−A+Y/µ)))
⇒ minE
λ‖E‖1 +µ
2‖E− (D−A+Y/µ)‖2F
⇒ minE
λ
µ‖E‖1 +
1
2‖E− (D−A+Y/µ)‖2F . (25)
Leow Wee Kheng (NUS) Robust PCA 26 / 52
Augmented Lagrange Multipliers
Compare Eq. 16
Tε(M) = argminX
ε‖X‖1 +1
2‖X−M‖2F ,
and Problem 25
minE
λ
µ‖E‖1 +
1
2‖E− (D−A+Y/µ)‖2F .
Set X = E, M = D−A+Y/µ. Then,
E = Tλ/µ(D−A+Y/µ). (26)
Leow Wee Kheng (NUS) Robust PCA 27 / 52
Augmented Lagrange Multipliers
From Problem 23, the minimal A with other variables fixed is given by
minA
‖A‖∗ + 〈Y,D−A−E〉+µ
2‖D−A−E‖2F
⇒ minA
+ tr(Y⊤(D−A−E)) +µ
2‖D−A−E‖2F
... (Homework)
⇒ minA
1
µ‖A‖∗ +
1
2‖A− (D−E+Y/µ)‖2F (27)
Compare Problem 27 and Eq. 15
UTε(S)V⊤ = argmin
X
ε‖X‖∗ +1
2‖X−M‖2F ,
Set X = A, M = D−E+Y/µ. Then,
A = UT1/µ(S)V⊤. (28)
Leow Wee Kheng (NUS) Robust PCA 28 / 52
Augmented Lagrange Multipliers
Robust PCA for Matrix Recovery via ALMInputs: D.
1. Initialize A = 0, E = 0.
2. Initialize Y, µ > 0, ρ > 1.
3. Repeat until convergence:
4. Repeat until convergence:
5. U,S,V = SVD(D−E+Y/µ).
6. A = UT1/µ(S)V⊤.
7. E = Tλ/µ(D−A+Y/µ).
8. Update Y = Y + µ(D−A−E).
9. Update µ = ρµ.
Outputs: A, E.
Leow Wee Kheng (NUS) Robust PCA 29 / 52
Augmented Lagrange Multipliers
Typical initialization [Lin2009]:
◮ Y = sgn(D)/J(sgn(D)).
◮ sgn(D) gives sign of each matrix element of D.
◮ J(·) gives scaling factors:
J(X) = max(‖X‖2, λ−1‖X‖∞).
◮ ‖X‖2 is spectral norm, largest singular value of X.
◮ ‖X‖∞ is largest absolute value of elements of X.
◮ µ = 1.25 ‖D‖2.
◮ ρ = 1.5.
◮ λ = 1/√max(m,n) for m×n matrix D.
Leow Wee Kheng (NUS) Robust PCA 30 / 52
Augmented Lagrange Multipliers
Example: Recovery of video background.
video frames
data matrix D
Leow Wee Kheng (NUS) Robust PCA 31 / 52
Augmented Lagrange Multipliers
Sample results:
data matrix low-rank error low-rank error
Leow Wee Kheng (NUS) Robust PCA 32 / 52
Augmented Lagrange Multipliers
Example: Removal of specular reflection and shadow.
D A E
Leow Wee Kheng (NUS) Robust PCA 33 / 52
Fixed-Rank Robust PCA
Fixed-Rank Robust PCA
In reflection removal, reflection may be global.
ground-truth input
Then, E is not sparse: violate RPCA condition!But, rank of A = 1.
Fix the rank of A to deal with non-sparse E [Leow2013].
Leow Wee Kheng (NUS) Robust PCA 34 / 52
Fixed-Rank Robust PCA
Fixed-Rank Robust PCA via ALMInputs: D.
1. Initialize A = 0, E = 0.
2. Initialize Y, µ > 0, ρ > 1.
3. Repeat until convergence:
4. Repeat until convergence:
5. U,S,V = SVD(D−E+Y/µ).
6. If rank(T1/µ(S)) < r, A = UT1/µ(S)V⊤; else A = USrV
⊤.
7. E = Tλ/µ(D−A+Y/µ).
8. Update Y = Y + µ(D−A−E).
9. Update µ = ρµ.
Outputs: A, E.
Sr is S with last m− r singular values set to 0.
Leow Wee Kheng (NUS) Robust PCA 35 / 52
Fixed-Rank Robust PCA
Example: Removal of local reflections.
ground-truth input
FRPCA RPCA
Leow Wee Kheng (NUS) Robust PCA 36 / 52
Fixed-Rank Robust PCA
Example: Removal of global reflections.
ground-truth input
FRPCA RPCA
Leow Wee Kheng (NUS) Robust PCA 37 / 52
Fixed-Rank Robust PCA
Example: Removal of global reflections.
ground-truth input
FRPCA RPCA
Leow Wee Kheng (NUS) Robust PCA 38 / 52
Fixed-Rank Robust PCA
Example: Background recovery for traffic video: fast moving vehicles.
input
FRPCA RPCA
Leow Wee Kheng (NUS) Robust PCA 39 / 52
Fixed-Rank Robust PCA
Example: Background recovery for traffic video: slow moving vehicles.
input
FRPCA RPCA
Leow Wee Kheng (NUS) Robust PCA 40 / 52
Fixed-Rank Robust PCA
Example: Background recovery for traffic video: temporary stop.
input
FRPCA RPCA
Leow Wee Kheng (NUS) Robust PCA 41 / 52
Matrix Completion
Matrix Completion
Customers are asked to rate the movies from 1 (poor) to 5 (excellent).
A 5 5 3 2B 3 4 3C 3 4 5 4D 4 5 5 3...
Customers rate only some movies ⇒ some data are missing.How to estimate the missing data? matrix completion.
Leow Wee Kheng (NUS) Robust PCA 42 / 52
Matrix Completion
Let D denote data matrix with missing elements set to 0, andM = {(i, j)} denote the indices of missing elements in D.
Then, the matrix completion problem can be formulated as
Given D and M , find matrix A that solves the problem
minA
‖A‖∗ subject to A+E = D, Eij = 0 ∀ (i, j) /∈ M. (29)
◮ For (i, j) /∈ M , constrain Eij = 0 so that Aij = Dij ; no change.
◮ For (i, j) ∈ M , Dij = 0, i.e., Aij = Eij ; recovered value.
Reformulating Problem 29 using augmented Lagrange multipliers gives
minA
‖A‖∗ + 〈Y,D−A−E〉+µ
2‖D−A−E‖2F (30)
such that Eij = 0 ∀ (i, j) /∈ M .
Leow Wee Kheng (NUS) Robust PCA 43 / 52
Matrix Completion
Robust PCA for Matrix CompletionInputs: D.
1. Initialize A = 0, E = 0.
2. Initialize Y, µ > 0, ρ > 1.
3. Repeat until convergence:
4. Repeat until convergence:
5. U,S,V = SVD(D−E+Y/µ).
6. A = UT1/µ(S)V⊤.
7. E = ΓM (D−A+Y/µ), where
ΓM (X) =
{Xij , for (i, j) ∈ M.0, for (i, j) /∈ M,
8. Update Y = Y + µ(D−A−E).
9. Update µ = ρµ.
Outputs: A, E.
Leow Wee Kheng (NUS) Robust PCA 44 / 52
Matrix Completion
Example: Recovery of occluded parts in face images.
input ground matrix matrixtruth completion recovery
Leow Wee Kheng (NUS) Robust PCA 45 / 52
Summary
Summary
Robustification of PCA
PCA
trimming
0−1 covariance
weighted SVD
robust error function
trimmed PCA
trimmed PCA
weighted PCA
winsorized PCA
Leow Wee Kheng (NUS) Robust PCA 46 / 52
Summary
Robust PCA
robust PCA(matrix recovery)
fixed−rankrobust PCA
robust PCA(matrix completion)
low−rank + sparsematrix decomposition
fix elements of Efix rank of A
PCA
Leow Wee Kheng (NUS) Robust PCA 47 / 52
Probing Questions
Probing Questions
◮ If the data matrix of a problem is composed of a low-rank matrix,a sparse matrix and something else, can you still use robust PCAmethods? If yes, how? If no, why?
◮ In application of robust PCA to high-resolution colour imageprocessing, the data matrix contains three times as many rows asthe number of pixels in the images, which can lead to a very largedata matrix that takes a long time to compute. Suggest a way toovercome this problem.
◮ In application of robust PCA to video processing, the data matrixcontains as many columns as the number of video frames, whichcan lead to a very large data matrix that is more than theavailable memory required to store the matrix. Suggest a way toovercome this problem.
Leow Wee Kheng (NUS) Robust PCA 48 / 52
Homework
Homework
1. Show thattr(X⊤Y) =
∑
i
∑
j
xijyij .
2. Show that the following two optimization problems are equivalent:
minE
λ‖E‖1 + tr(−Y⊤E) +µ
2tr((D−A−E)⊤(D−A−E))
minE
λ‖E‖1 +µ
2tr((E− (D−A+Y/µ))⊤(E− (D−A+Y/µ)))
3. Show that the minimal A of Problem 23 with other variables fixedis given by
minA
1
µ‖A‖∗ +
1
2‖A− (D−E+Y/µ)‖2F .
Leow Wee Kheng (NUS) Robust PCA 49 / 52
References
References I
1. D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific,2003.
2. S. Boyd and L. Vandenberghe. Convex Optimization. CambridgeUniversity Press, 2004.
3. J. F. Cai, E. J. Candes, Z. Shen. A singular value thresholding algorithmfor matrix completion. SIAM Journal of Optimization, 20(4):1956–1982,2010.
4. E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal componentanalysis? Journal of ACM, 58(3), 2011.
5. F. De la Torre and M. J. Black. Robust principal component analysis forcomputer vision. In Proc. Int. Conf. Computer Vision, 2001.
6. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.
Leow Wee Kheng (NUS) Robust PCA 50 / 52
References
References II
7. K. Gabriel and S. Zamir. Lower rank approximation of matrices by leastsquares with any choice of weights. Technometrics, 21:489-498, 1979.
8. E. T. Hale, W. Yin, Y. Zhang. A fixed-point continuation method forl1-regularized minimization with applications to compressed sensing.Tech. Report TR07-07, Dept. of Comp. and Applied Math., RiceUniversity, 2007.
9. W. K. Leow, Y. Cheng, L. Zhang, T. Sim, and L. Foo. Backgroundrecovery by fixed-rank robust principal component analysis. In Proc.Int. Conf. on Computer Analysis of Images and Patterns, 2013.
10. Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrange multipliermethod for exact recovery of corrupted low-rank matrices. Tech. Report
UILU-ENG-09-2215, UIUC, 2009.
11. N. Z. Shor. Minimization Methods for Non-Differentiable Functions.Springer-Verlag, 1985.
Leow Wee Kheng (NUS) Robust PCA 51 / 52
References
References III
12. J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao. Robust principalcomponent analysis: Exact recovery of corrupted low-rank matrices byconvex optimization. In Proc. Conf. on Neural Information Processing
Systems, 2009.
13. L. Xu and A. Yuille. Robust principal component analysis byself-organizing rules based on statistical physics approach. IEEE Trans.
Neural Networks, 6(1):131-143, 1995.
Leow Wee Kheng (NUS) Robust PCA 52 / 52