# the low-rank basis problem for a matrix subspace

Post on 18-Aug-2015

100 views

Embed Size (px)

TRANSCRIPT

- 1. The low-rank basis problem for a matrix subspace Tasuku Soma Univ. Tokyo Joint work with: Yuji Nakatsukasa (Univ. Tokyo) Andr Uschmajew (Univ. Bonn) 1 / 29
- 2. 1 The low-rank basis problem 2 Algorithm 3 Convergence Guarantee 4 Experiments 2 / 29
- 3. 1 The low-rank basis problem 2 Algorithm 3 Convergence Guarantee 4 Experiments 3 / 29
- 4. The low-rank basis problem Low-rank basis problem: for a matrix subspace M Rmn spanned by M1,. . . ,Md Rmn , minimize rank(X1) + + rank(Xd ) subject to span{X1,. . . ,Xd } = M. 4 / 29
- 5. The low-rank basis problem Low-rank basis problem: for a matrix subspace M Rmn spanned by M1,. . . ,Md Rmn , minimize rank(X1) + + rank(Xd ) subject to span{X1,. . . ,Xd } = M. Generalizes the sparse basis problem: minimize x1 0 + + xd 0 subject to span{x1,. . . ,xd } = S RN . Matrix singular values play role of vector nonzero elements 4 / 29
- 6. Scope lowrank basis sparse basis (Coleman-Pothen 86) basis problems
- 7. Scope lowrank basis sparse basis (Coleman-Pothen 86) basis problems lowrank matrix sparse vector (Qu-Sun-Wright 14) single element problems sparse vector problem is NP-hard [Coleman-Pothen 1986] 5 / 29
- 8. Scope lowrank basis sparse basis (Coleman-Pothen 86) basis problems lowrank matrix sparse vector (Qu-Sun-Wright 14) single element problems sparse vector problem is NP-hard [Coleman-Pothen 1986] Related studies: dictionary learning [Sun-Qu-Wright 14], sparse PCA [Spielman-Wang-Wright],[Demanet-Hand 14] 5 / 29
- 9. Applications memory-ecient representation of matrix subspace matrix compression beyond SVD dictionary learning string theory: rank-decient matrix in rectangular subspace image separation accurate eigenvector computation maximum-rank completion (discrete mathematics) ... 6 / 29
- 10. 1 The low-rank basis problem 2 Algorithm 3 Convergence Guarantee 4 Experiments 7 / 29
- 11. Abstract greedy algorithm Algorithm 1 Greedy meta-alg. for computing a low-rank basis Input: Subspace M Rmn of dimension d. Output: Basis B = {X 1 ,. . . ,X d } of M. Initialize B = . for = 1,. . . ,d do Find X M of lowest possible rank s.t. B {X} is linearly independent. B B {X} If each step is successful, this nds the required basis! 8 / 29
- 12. Greedy algorithm: lemma Lemma X 1 ,. . . ,X d : output of greedy algorithm. For any {1,. . . ,d} and lin. indep. {X1,. . . ,X } M with rank(X1) rank(X ), rank(Xi ) rank(X i ) for i = 1,. . . , . 9 / 29
- 13. Greedy algorithm: lemma Lemma X 1 ,. . . ,X d : output of greedy algorithm. For any {1,. . . ,d} and lin. indep. {X1,. . . ,X } M with rank(X1) rank(X ), rank(Xi ) rank(X i ) for i = 1,. . . , . Proof. If rank(X ) < rank(X), then rank(Xi ) < rank(X) for i . But since one Xi must be linearly independent from X 1 ,. . . ,X 1 , this contradicts the choice of X. (Adaption of standard argument from matroid theory.) 9 / 29
- 14. Greedy algorithm: justication Theorem X 1 ,. . . ,X d : lin. indep. output of greedy algorithm. Then {X1,. . . ,X } is of minimal rank i rank(Xi ) = rank(X i ) for i = 1,. . . , . In particular, {X 1 ,. . . ,X} is of minimal rank. Analogous result for sparse basis problem in [Coleman, Pothen 1986] 10 / 29
- 15. The single matrix problem minimize rank(X) subject to X M{0}. NP-hard of course (since sparse vector is) 11 / 29
- 16. The single matrix problem minimize rank(X) subject to X M{0}. NP-hard of course (since sparse vector is) Nuclear norm heuristic ( A := i (A)) minimize X subject to X M, X F = 1. NOT a convex relaxation due to non-convex constraint. 11 / 29
- 17. Algorithm Outline (for the single matrix problem) Phase I: rank estimate Y = S(X), X = PM (Y) PM (Y) F until rank(Y) converges Phase II: alternating projection Y = Tr (X), X = PM (Y) PM (Y) F estimated r = rank(Y) 12 / 29
- 18. Algorithm Outline (for the single matrix problem) Phase I: rank estimate Y = S(X), X = PM (Y) PM (Y) F until rank(Y) converges Phase II: alternating projection Y = Tr (X), X = PM (Y) PM (Y) F estimated r = rank(Y) 12 / 29
- 19. Shrinkage operator Shrinkage operator (soft thresholding) for X = UVT : S(X) = US()VT , S() = diag(1 ,. . . ,rank(X) )+ Fixed-point iteration Y = S(X), X = PM (Y) PM (Y) F 13 / 29
- 20. Shrinkage operator Shrinkage operator (soft thresholding) for X = UVT : S(X) = US()VT , S() = diag(1 ,. . . ,rank(X) )+ Fixed-point iteration Y = S(X), X = PM (Y) PM (Y) F Interpretation: [Cai, Candes, Shen 2010], [Qu, Sun, Wright @NIPS 2014] block coordinate descent (a.k.a. alternating direction) for minimize X,Y Y + 1 2 Y X 2 F subject to X M, and X F = 1, [Qu, Sun, Wright @NIPS 2014]: analogous method for sparsest vector. 13 / 29
- 21. The use as a rank estimator Y = S(X), X = PM (Y) PM (Y) F The xed point of Y would be a matrix of low-rank r, which is close to, but not in M if r > 1. otherwise, it would be a xed point of Y = S (Y) S (Y) F which can hold only for rank-one matrices. The xed point of X usually has full rank, and too large i 1. Need further improvement, but accept r as rank estimate. 14 / 29
- 22. Algorithm Outline (for the single matrix problem) Phase I: rank estimate Y = S(X), X = PM (Y) PM (Y) F until rank(Y) converges Phase II: alternating projection Y = Tr (X), X = PM (Y) PM (Y) F estimated r = rank(Y) 15 / 29
- 23. Obtaining solution: truncation operator Truncation operator (hard thresholding) for X = UVT : Tr (X) = UTr ()VT , Tr () = diag(1,. . . ,r,0,. . . ,0) Fixed-point iteration Y = Tr (X), X = PM (Y) PM (Y) F 16 / 29
- 24. Obtaining solution: truncation operator Truncation operator (hard thresholding) for X = UVT : Tr (X) = UTr ()VT , Tr () = diag(1,. . . ,r,0,. . . ,0) Fixed-point iteration Y = Tr (X), X = PM (Y) PM (Y) F Interpretation: alternating projection method for nding X {X M : X F = 1} {Y : rank(Y) r} . 16 / 29
- 25. Greedy algorithm: pseudocode Algorithm 2 Greedy algorithm for computing a low-rank basis Input: Basis M1,. . . Md Rmn for M Output: Low-rank basis X1,. . . ,Xd of M. for = 1,. . . ,d do Phase I on X , obtain rank estimate r. Phase II on X with rank r, obtain X M of rank r. To force linear independence, restarting is sometimes necessary: X is always initialized and restarted in span{X1,. . . ,X 1} M. Phase I output X is Phase II input 17 / 29
- 26. 1 The low-rank basis problem 2 Algorithm 3 Convergence Guarantee 4 Experiments 18 / 29
- 27. Observed convergence (single initial guess) m = 20,n = 10,d = 5, exact ranks: (1,2,3,4,5). ranks recovered in wrong order (2,1,5,3,4) 19 / 29
- 28. Observed convergence (several initial guesses) ranks recovered in correct order 20 / 29
- 29. Local convergence of Phase II Rr := {X : rank(X) = r}, B := {X M : X F = 1} TX (N ): tangent space of manifold N at X Theorem Assume X Tr B has rank r, input of Phase II, and TX Rr TX B = . Then Phase II is locally linearly convergent: Xnew X F X X F cos 21 / 29
- 30. Local convergence of Phase II Rr := {X : rank(X) = r}, B := {X M : X F = 1} TX (N ): tangent space of manifold N at X Theorem Assume X Tr B has rank r, input of Phase II, and TX Rr TX B = . Then Phase II is locally linearly convergent: Xnew X F X X F cos Follows from a meta-theorem on alternating projections in nonlinear optimization [Lewis, Luke, Malick 2009] We provide direct linear algebra proof Assumption holds if X is isolated rank-r matrix in M 21 / 29
- 31. Local convergence: intuition X TX B TX Rr cos 1 2 X TX B TX Rr cos 0.9 Xnew X F X X F cos + O( X X 2 F ) (0, 2 ]: subspace angle between TX B and TX Rr cos = max XTX B YTX Rr | X,Y F | X F Y F . 22 / 29
- 32. 1 The low-rank basis problem 2 Algorithm 3 Convergence Guarantee 4 Experiments 23 / 29
- 33. Results for synthetic data exact ranks av. sum(ranks) av. Phase I err (iter) av. Phase II err (iter) ( 1 , 1 , 1 , 1 , 1) 5.05 2.59e-14 (55.7) 7.03e-15 (0.4) ( 2 , 2 , 2 , 2 , 2 ) 10.02 4.04e-03 (58.4) 1.04e-14 (9.11) ( 1 , 2 , 3 , 4 , 5) 15.05 6.20e-03 (60.3) 1.38e-14 (15.8) ( 5 , 5 , 5 , 10 , 10) 35.42 1.27e-02 (64.9) 9.37e-14 (50.1) ( 5 , 5 , 10 , 10 , 15) 44.59 2.14e-02 (66.6) 3.96e-05 (107) Table: m = n = 20, d = 5, random initial guess. exact ranks av. sum (ranks) av. Phase I err (iter) av. Phase II err (iter) ( 1 , 1 , 1 , 1 , 1) 5.00 6.77e-15 (709) 6.75e-15 (0.4) ( 2 , 2 , 2 , 2 , 2) 10.00 4.04e-03 (393) 9.57e-15 (9.0) ( 1 , 2 , 3 , 4 , 5) 15.00 5.82e-03 (390) 1.37e-14 (18.5) ( 5 , 5 , 5 , 10 , 10) 35.00 1.23e-02 (550) 3.07e-14 (55.8) ( 5 , 5 , 10 , 10 , 15) 44.20 2.06e-02 (829) 8.96e-06 (227) Table: Five random initial guesses. 24 / 29
- 34. Image separation original mixed computed 25 / 29
- 35. Link to tensor decomposition Rank-one basis: M = span{a1 bT 1 ,. . . ,ad bT d } If M1,. . . ,Md is any basis, then Mk = d =1 ck, a bT (k = 1,. . . ,d) T = d =1 a b c , where T is the third-order tensor with slices M . rank(T ) = d. Suggests nding rank-one basis using CP decomposition algorithms (ALS, Generalized Eigenvalue, ...) but CP not enough for higher-rank case 26 / 29

Recommended

View more >