relaxations convexes pour l'estimation de matrices à...

Relaxations convexes pour l’estimation dematrices a facteurs parcimonieux

Guillaume Obozinski

Ecole des Ponts - ParisTech

Collaboration avec Emile Richard et Jean-Philippe Vert

Journees MAS, Toulouse, Aout 2014

1/40

Page 2: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Estimation low rank matrices with sparse factors

X =

X =r∑

i=1

uiv>i

factors not orthogonal a priori

6= from assuming the SVD of X is sparse

2/40

Page 3: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Dictionary Learning

minA∈Rk×n

D∈Rp×k

n∑

i=1

‖xi − Dαi‖22 + λ

n∑

i=1

‖αi‖1 s.t. ∀j , ‖dj‖2 ≤ 1.

Dictionary Learning

XT D α= .

e.g. overcomplete dictionariesfor natural images

sparse decomposition

(Elad and Aharon, 2006)

3/40

Page 4: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Dictionary Learning /Sparse PCA

minA∈Rk×n

D∈Rp×k

n∑

i=1

‖xi − Dαi‖22 + λ

n∑

i=1

‖αi‖1 s.t. ∀j , ‖dj‖2 ≤ 1.

Dictionary Learning

XT D α= .

e.g. overcomplete dictionariesfor natural images

sparse decomposition

(Elad and Aharon, 2006)

Sparse PCA

XT α= .D

e.g. microarray data

sparse dictionary

(Witten et al., 2009; Bach et al.,2008)

Sparsity of the loadings vs sparsity of the dictionary elements3/40

Page 5: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Applications

Low rank factorization with “community structure”

Modeling clusters or community structure in social networks orrecommendation systems (Richard et al., 2012).

Subspace clustering (Wang et al., 2013)

Up to an unknown permutation, X> =[X>1 . . . X>K

]with Xk low rank, so that there exists a low rank matrix Zk such thatXk = ZkXk . Finally,

X = ZX with Z = BkDiag(Z1, . . . ,ZK ).

Sparse PCA from Σn

Sparse bilinear regression

y = x>Mx ′ + ε

4/40

Page 6: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Existing approaches

Bi-convex formulations

minU,VL(UV>) + λ(‖U‖1 + ‖V ‖1),

with U ∈ Rn×r , V ∈ Rp×r .

Convex formulation for sparse and low rank

minZL(Z ) + λ‖Z‖1 + µ‖Z‖∗

Doan and Vavasis (2013); Richard et al. (2012)

factors not necessarily sparse as r increases.

5/40

Page 7: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Existing approaches

Bi-convex formulations

minU,VL(UV>) + λ(‖U‖1 + ‖V ‖1),

with U ∈ Rn×r , V ∈ Rp×r .

Convex formulation for sparse and low rank

minZL(Z ) + λ‖Z‖1 + µ‖Z‖∗

Doan and Vavasis (2013); Richard et al. (2012)

factors not necessarily sparse as r increases.

5/40

Page 8: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Rank one case with square loss

minu,v ,σ‖X − σuv>‖2 s.t. σ ∈ R+,

u ∈ U ⊂ Rn, v ∈ V ⊂ Rp,

‖u‖2 = ‖v‖2 = 1.

is equivalent to solving

maxu,v

u>X v s.t. u ∈ U ⊂ Rn, v ∈ V ⊂ Rp,

‖u‖2 = ‖v‖2 = 1.

Corresponds to sparse PCA when u = v .

6/40

Page 9: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Convex relaxations for sparse PCA

Approaches differ according to view

analysis view → build sequences of rank 1 approximations,

synthesis view→ find a set of common factors simultaneously

Analysis SPCA focusses on solving rank-1 sparse PCA

convex formulations: d’Aspremont et al. (2007, 2008); Amini andWainwright (2009)

modified power methods: Journee et al. (2010); Luss and Teboulle(2013); Yuan and Zhang (2013)

Synthesis SPCA focusses on finding several complementary sparsefactors

Essentially based on nuclear norms (Jameson, 1987; Bach et al., 2008;Bach, 2013).

7/40

Page 10: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

7/40

Page 11: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

7/40

Page 12: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

A new formulationfor sparse matrix factorization

and a new matrix norm

8/40

Page 13: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

A new formulation for sparse matrix factorization

Assumptions:X =

r∑

i=1

aib>i

All left factors ai have support of size k .

All right factors bi have support of size q.

Goals:

Propose a convex formulation for sparse matrix factorization that

is able to handle multiple sparse factors

permits to identify the sparse factors themselves

leads to better statistical performance than `1/trace norm.

Propose algorithms based on this formulation.

9/40

Page 14: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Assumptions:X =

r∑

i=1

aib>i

Goals:

9/40

Page 15: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Assumptions:X =

r∑

i=1

aib>i

Goals:

9/40

Page 16: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

(k , q)-sparse counterpart of the rank

For any j , define Anj = a ∈ Rn : ‖a‖0 ≤ j , ‖a‖2 = 1 .

Given a matrix Z ∈ Rm1×m2 , consider

min(ai ,bi ,ci )i∈N∗

‖c‖0 s.t. Z =∞∑

i=1

ciaib>i , (ai , bi , ci ) ∈ Am1

k ×Am2q × R+ ,

Define

the (k , q)-rank of Z as the optimal value r := ‖c?‖0

a (k , q)-decomposition of Z any optimal solution (a?i , b?i , c

?i )1≤i≤r

10/40

Page 17: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Given a matrix Z ∈ Rm1×m2 ,

consider

‖c‖0 s.t. Z =∞∑

i=1

k ×Am2q × R+ ,

Define

?i )1≤i≤r

10/40

Page 18: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖c‖0 s.t. Z =∞∑

i=1

k ×Am2q × R+ ,

Define

?i )1≤i≤r

10/40

Page 19: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖c‖0 s.t. Z =∞∑

i=1

k ×Am2q × R+ ,

Define

?i )1≤i≤r

10/40

Page 20: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖c‖0 s.t. Z =∞∑

i=1

k ×Am2q × R+ ,

Define

?i )1≤i≤r

10/40

Page 21: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

For a matrix Z ∈ Rm1×m2 , we have

(1, 1)-rank (k , q)-rank (m1,m2)-rank

combinatorial penality ‖Z‖0 r∗k,q(Z ) rank(Z )

convex relaxation ‖Z‖1 ? ‖Z‖∗

Can we define a principled relaxation of the (k , q)-rank?

11/40

Page 22: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

For a matrix Z ∈ Rm1×m2 , we have

(1, 1)-rank (k , q)-rank (m1,m2)-rank

combinatorial penality ‖Z‖0 r∗k,q(Z ) rank(Z )

convex relaxation ‖Z‖1 ? ‖Z‖∗

Can we define a principled relaxation of the (k , q)-rank?

11/40

Page 23: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Atomic Norm (Chandrasekaran et al., 2012)

Definition (Atomic norm of the set of atoms A)

Given a set of atoms A, the associated atomic norm is defined as

‖x‖A = inft > 0 | x ∈ t conv(A).

NB: This is really a norm if A is centrally symmetric and spans Rp

Proposition (Primal and dual form of the norm)

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

‖x‖∗A = supa∈A〈a, x〉

12/40

Page 24: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

12/40

Page 25: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

12/40

Page 26: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

12/40

Page 27: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Examples of atomic norms

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

vector `1-norm: x 7→ ‖x‖1

A =± ek | 1 ≤ k ≤ p

matrix trace norm: Z 7→ ‖Z‖∗ (sum of singular value)

A =ab> | a ∈ Sm1−1, b ∈ Sm2−1

13/40

Page 28: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

A =± ek | 1 ≤ k ≤ p

A =ab> | a ∈ Sm1−1, b ∈ Sm2−1

13/40

Page 29: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

A =± ek | 1 ≤ k ≤ p

A =ab> | a ∈ Sm1−1, b ∈ Sm2−1

13/40

Page 30: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

‖x‖A = inf

∑

a∈Aca | x =

∑

a∈Aca a, ca > 0, ∀a ∈ A

A =± ek | 1 ≤ k ≤ p

A =ab> | a ∈ Sm1−1, b ∈ Sm2−1

13/40

Page 31: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

A convex relaxation of the (k , q)-rankWith

Anj = a ∈ Rn : ‖a‖0 ≤ j , ‖a‖2 = 1

consider the set of atoms

Ak,q :=ab> | a ∈ Am1

k , b ∈ Am2q

.

The atomic norm associated with Ak,q is

Ωk,q(Z ) = inf

∑

A∈Ak,q

cA | Z =∑

A∈Ak,q

cA A, cA > 0, ∀A ∈ A

so that

Ωk,q(Z ) = inf

‖c‖1 s.t. Z =

∞∑

i=1

k ×Am2q ×R+

Call Ωk,q the (k , q)-trace norm and solutions the (k, q)-sparse SVDs.

14/40

Page 32: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Anj = a ∈ Rn : ‖a‖0 ≤ j , ‖a‖2 = 1

k , b ∈ Am2q

.

Ωk,q(Z ) = inf

∑

A∈Ak,q

cA | Z =∑

A∈Ak,q

cA A, cA > 0, ∀A ∈ A

so that

Ωk,q(Z ) = inf

‖c‖1 s.t. Z =

∞∑

i=1

k ×Am2q ×R+

14/40

Page 33: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Anj = a ∈ Rn : ‖a‖0 ≤ j , ‖a‖2 = 1

k , b ∈ Am2q

.

Ωk,q(Z ) = inf

∑

A∈Ak,q

cA | Z =∑

A∈Ak,q

cA A, cA > 0, ∀A ∈ A

so that

Ωk,q(Z ) = inf

‖c‖1 s.t. Z =

∞∑

i=1

k ×Am2q ×R+

14/40

Page 34: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Anj = a ∈ Rn : ‖a‖0 ≤ j , ‖a‖2 = 1

k , b ∈ Am2q

.

Ωk,q(Z ) = inf

∑

A∈Ak,q

cA | Z =∑

A∈Ak,q

cA A, cA > 0, ∀A ∈ A

so that

Ωk,q(Z ) = inf

‖c‖1 s.t. Z =

∞∑

i=1

k ×Am2q ×R+

14/40

Page 35: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Properties of the (k , q)-trace norm

Nesting property

Ωm1,m2(Z ) = ‖Z‖∗ ≤ Ωk,q(Z ) ≤ ‖Z‖1 = Ω1,1(Z )

Dual norm and reformulation

Let ‖ · ‖op denote the operator norm.

Let Gk,q =

(I , J) ⊂[[

1,m1

]]×[[

1,m2

]], |I | = k , |J| = q

Given that ‖x‖∗A = supa∈A 〈a, x〉, we have

Ω∗k,q(Z ) = max(I ,J)∈Gk,q

‖ZI ,J‖op and

Ωk,q(Z ) = inf

∑

(I ,J)∈Gk,q

∥∥A(IJ)∥∥∗ : Z =

∑

(I ,J)∈Gk,q

A(IJ) , supp(A(IJ)) ⊂ I×J

15/40

Page 36: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Nesting property

Ωm1,m2(Z ) = ‖Z‖∗ ≤ Ωk,q(Z ) ≤ ‖Z‖1 = Ω1,1(Z )

Let Gk,q =

(I , J) ⊂[[

1,m1

]]×[[

1,m2

]], |I | = k , |J| = q

‖ZI ,J‖op and

Ωk,q(Z ) = inf

∑

(I ,J)∈Gk,q

∥∥A(IJ)∥∥∗ : Z =

∑

(I ,J)∈Gk,q

15/40

Page 37: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Nesting property

Ωm1,m2(Z ) = ‖Z‖∗ ≤ Ωk,q(Z ) ≤ ‖Z‖1 = Ω1,1(Z )

Let Gk,q =

(I , J) ⊂[[

1,m1

]]×[[

1,m2

]], |I | = k , |J| = q

‖ZI ,J‖op and

Ωk,q(Z ) = inf

∑

(I ,J)∈Gk,q

∥∥A(IJ)∥∥∗ : Z =

∑

(I ,J)∈Gk,q

15/40

Page 38: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Nesting property

Ωm1,m2(Z ) = ‖Z‖∗ ≤ Ωk,q(Z ) ≤ ‖Z‖1 = Ω1,1(Z )

Let Gk,q =

(I , J) ⊂[[

1,m1

]]×[[

1,m2

]], |I | = k , |J| = q

‖ZI ,J‖op and

Ωk,q(Z ) = inf

∑

(I ,J)∈Gk,q

∥∥A(IJ)∥∥∗ : Z =

∑

(I ,J)∈Gk,q

15/40

Page 39: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

The (k,q)-CUT-norm: an `∞ counterpart

With the following subset of Amk :

Amk =

a ∈ Rm, ‖a‖0 = k , ∀i ∈ supp(a), |ai | = 1√

k

,

Ak,q =ab> : a ∈ Am1

k , b ∈ Am2q

.

Denote Ωk,q the “`∞” counterpart of the (k , q)-trace norm.

When k = m1 and q = m2, Ωn,n is the gauge function of theCUT-polytope of a bipartite graph (Deza and Laurent, 1997).

16/40

Page 40: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Amk =

a ∈ Rm, ‖a‖0 = k , ∀i ∈ supp(a), |ai | = 1√

k

,

k , b ∈ Am2q

.

16/40

Page 41: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Amk =

a ∈ Rm, ‖a‖0 = k , ∀i ∈ supp(a), |ai | = 1√

k

,

k , b ∈ Am2q

.

16/40

Page 42: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Amk =

a ∈ Rm, ‖a‖0 = k , ∀i ∈ supp(a), |ai | = 1√

k

,

k , b ∈ Am2q

.

16/40

Page 43: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Vector caseWhen q = m2 = 1, we retrieve vector norms:

Ωk,1 = θk is the k-support norm of Argyriou et al. (2012).

Ωk,1 = κk with κk the vector Ky Fan norm.

θk

κj(w) =1√j

max(‖w‖∞,

1

j‖w‖1

).

κk

17/40

Page 44: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

θk

κj(w) =1√j

max(‖w‖∞,

1

j‖w‖1

).

κk

17/40

Page 45: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

θk

κj(w) =1√j

max(‖w‖∞,

1

j‖w‖1

).

κk 17/40

Page 46: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Relation between unit balls

L1

k−support

κk

θk , κk and 1√k‖ · ‖1 for k = 2 in R3.

18/40

Page 47: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Learning matrices with sparse factors

minZ

n∑

i=1

`(x>i Z x ′i , yi

)+ λΩk,q(Z ) ,

Subspace clustering

minZ

Ωk,k(Z ) s.t. ZX = X .

Rank r sparse PCA

minZ

1

2‖Σn − Z‖2

F + λ Ωk,k(Z ) s.t. Z 0 , or

minZ

1

2‖Σn − Z‖2

F + λ Ωk,(Z ) ,

with Ωk, the atomic norm for the set Ak, = aa>, a ∈ Ak .

19/40

Page 48: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

minZ

n∑

i=1

`(x>i Z x ′i , yi

)+ λΩk,q(Z ) ,

Subspace clustering

minZ

Rank r sparse PCA

minZ

1

2‖Σn − Z‖2

F + λ Ωk,k(Z ) s.t. Z 0 , or

minZ

1

2‖Σn − Z‖2

F + λ Ωk,(Z ) ,

19/40

Page 49: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

minZ

n∑

i=1

`(x>i Z x ′i , yi

)+ λΩk,q(Z ) ,

Subspace clustering

minZ

Rank r sparse PCA

minZ

1

2‖Σn − Z‖2

F + λ Ωk,k(Z ) s.t. Z 0 , or

minZ

1

2‖Σn − Z‖2

F + λ Ωk,(Z ) ,

19/40

Page 50: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Statistical guarantees

20/40

Page 51: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Statistical dimension (Amelunxen et al., 2013)

•

Z?

Y = Z? + ε

Z

Z? + TΩ(Z?)

Ω(·) ≤ 1

www.cmap.polytechnique.fr/ giraud/MSV/LectureNotes.pdf

2

figure inspired by Amelunxen et al. (2013)

21/40

Page 52: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Tangent cone:

TΩ(Z ) :=⋃

τ>0

H ∈ Rm1×m2 : Ω(Z + τH) ≤ Ω(Z ) .

The statistical dimension S(Z ,Ω) of Ω at Z can then be formallydefined as

S(Z ,Ω) :=S(TΩ(Z )

)= E

[∥∥ΠTΩ(Z)(G )∥∥2

Fro

],

where

G is a matrix with i.i.d. standard normal entries

ΠTΩ(Z)(G ) is the orthogonal projection of G onto TΩ(Z ).

“The statistical dimension δ is the unique continuous,rotation-invariant localization valuation on the set of convex cones

that satisfies δ(L) = dim(L) for any subspace L.”

22/40

Page 53: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Tangent cone:

TΩ(Z ) :=⋃

τ>0

H ∈ Rm1×m2 : Ω(Z + τH) ≤ Ω(Z ) .

The statistical dimension S(Z ,Ω) of Ω at Z can then be formallydefined as

S(Z ,Ω) :=S(TΩ(Z )

)= E

[∥∥ΠTΩ(Z)(G )∥∥2

Fro

],

where

G is a matrix with i.i.d. standard normal entries

ΠTΩ(Z)(G ) is the orthogonal projection of G onto TΩ(Z ).

“The statistical dimension δ is the unique continuous,rotation-invariant localization valuation on the set of convex cones

that satisfies δ(L) = dim(L) for any subspace L.”

22/40

Page 54: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Relation between Gaussian width and statistical dimension

Gaussian width of intersection of the cone with a Euclidean ball:

w(C ) = maxU∈TΩ(Z)∩ Sd−1

〈U,G 〉

= E[‖ΠC (G )‖Fro

].

Sparse matrices

X =

•

0

G

ΠC(G)

C

1

Amelunxen et al. (2013) show that

w(C )2 ≤ S(C ) ≤ w(C )2 + 1.

23/40

Page 55: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

General Null Space property

Consider the optimization problem

minZ

Ω(Z ) s.t. y = X (Z ) (1)

Theorem (NSP)

Z ∗ is the unique optimal solution of (1) if and only if

Ker(X ) ∩ TΩ(Z ∗) = ∅.Note: this motivates a posteriori the construction of atomic norms.

24/40

Page 56: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

minZ

Ω(Z ) s.t. y = X (Z ) (1)

Theorem (NSP)

Ker(X ) ∩ TΩ(Z ∗) = ∅.

Note: this motivates a posteriori the construction of atomic norms.

24/40

Page 57: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

minZ

Ω(Z ) s.t. y = X (Z ) (1)

Theorem (NSP)

Ker(X ) ∩ TΩ(Z ∗) = ∅.Note: this motivates a posteriori the construction of atomic norms.

24/40

Page 58: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Nullspace property and S (Chandrasekaran et al., 2012)PHASE TRANSITIONS IN RANDOM CONVEX PROGRAMS 7

x0 +null(A)

x : f (x) ≤ f (x0)

x0

x0 +D( f , x0)

x0 +null(A)

x : f (x) ≤ f (x0)

x0

x0 +D( f , x0)

FIGURE 2.3: The optimality condition for a regularized inverse problem. The condition for the regularizedlinear inverse problem (2.4) to succeed requires that the descent cone D( f , x0) and the null space null(A) donot share a ray. [left] The regularized linear inverse problem succeeds. [right] The regularized linear inverseproblem fails.

The function f is called a regularizer, and the formulation (2.4) is called a regularized linear inverse problem.To illustrate the kinds of regularizers that arise in practice, we highlight two familiar examples.

Example 2.5 (Sparse vectors). When the vector x0 is known to be sparse, we can minimize the `1 norm tolook for a sparse solution to the inverse problem. Repeating (1.2), we have the optimization

minimize ‖x‖1 subject to z0 = Ax . (2.5)

This approach was proposed by Chen et al. [CDS01], motivated by work in geophysics [CM73, SS86].

Example 2.6 (Low-rank matrices). Suppose that X0 is a low-rank matrix, and we have acquired a vector ofmeasurements of the form z0 =A (X0), where A is a linear operator. This process is equivalent with (2.3).We can look for low-rank solutions to the linear inverse problem by minimizing the Schatten 1-norm:

minimize ‖X ‖S1 subject to z0 =A (X ). (2.6)

This method was proposed in [RFP10], based on ideas from control [MP97] and optimization [Faz02].

We say that the regularized linear inverse problem (2.4) succeeds at solving (2.3) when the convex programhas a unique minimizer x that coincides with the true unknown; that is, x = x0. To develop conditions forsuccess, we introduce a convex cone associated with the regularizer f and the unknown x0.

Definition 2.7 (Descent cone). The descent cone D( f , x) of a proper convex function f : Rd → R at a pointx ∈Rd is the conic hull of the perturbations that do not increase f near x.

D( f , x) :=⋃τ>0

y ∈Rd : f (x +τy) ≤ f (x)

.

The descent cones of a proper convex function are always convex, but they may not be closed. The descentcones of a smooth convex function are always halfspaces, so this concept inspires the most interest when thefunction is nonsmooth.

To characterize when the optimization problem (2.4) succeeds, we write the primal optimality condition interms of the descent cone; cf. [RV08, Sec. 4] and [CRPW12, Prop. 2.1].

Fact 2.8 (Optimality condition for linear inverse problems). Let f be a proper convex function. The vector x0 isthe unique optimal point of the convex program (2.4) if and only if D( f , x0)∩null(A) = 0.

Figure 2.3 illustrates the geometry of this optimality condition. Despite its simplicity, this result forges acrucial link between the convex optimization problem (2.4) and the theory of conic integral geometry.

Figure from Amelunxen et al. (2013)

Exact recovery from random measurements

With X : Rp → Rn rand. lin. map from the std Gaussian ensemble

Z = arg minZ

Ω(Z ) s.th. X (Z ) = y

is equal to Z ? w.h.p. as soon as n ≥ S(Z ?,Ω).25/40

Page 59: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Denoising with an atomic norm

If

Y = Z ? + σ√nε

with ε standard Gaussian,

then

Z = arg minZ‖Z − Y ‖Fro

s.t. Ω(Z ) ≤ Ω(Z ?)

satisfies

E‖Z − Z ?‖2 ≤ σ2

nS(Z ?,Ω).

•

Z?

Y = Z? + ε

Z

Z? + TΩ(Z?)

Ω(·) ≤ 1

www.cmap.polytechnique.fr/ giraud/MSV/LectureNotes.pdf

2

26/40

Page 60: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Statistical dimension at elements of Ak ,q

Consider an element ab> ∈ Ak,q.

We have ‖ab>‖0 = kq.

So

Matrix norm S

`1 Θ(kq log m1m2kq )

trace-norm Θ(m1 + m2)

`1 + trace-norm ?

(k , q)-trace ?

(k , q)-cut ?

27/40

Page 61: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Theoretical results

Proposition (UB on S(A, Ωk,q))

For any A ∈ Ak,q, we have that

S(A, Ωk,q) ≤ 16(k + q) + 9

(k log

m1

k+ q log

m2

q

).

Proposition (UB on S(A,Ωk,q))

Let A = ab> ∈ Ak,q with I0 = supp(a) and J0 = supp(b).

Let γ(a, b) := (k mini∈I0

a2i ) ∧ (q min

j∈J0

b2j ),

we have

S(A,Ωk,q) ≤ 322

γ2(k + q + 1) +

160

γ(k ∨ q) log (m1 ∨m2) .

28/40

Page 62: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Theoretical results

Proposition (UB on S(A, Ωk,q))

For any A ∈ Ak,q, we have that

S(A, Ωk,q) ≤ 16(k + q) + 9

(k log

m1

k+ q log

m2

q

).

Proposition (UB on S(A,Ωk,q))

Let A = ab> ∈ Ak,q with I0 = supp(a) and J0 = supp(b).

Let γ(a, b) := (k mini∈I0

a2i ) ∧ (q min

j∈J0

b2j ),

we have

S(A,Ωk,q) ≤ 322

γ2(k + q + 1) +

160

γ(k ∨ q) log (m1 ∨m2) .

28/40

Page 63: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Summary of results for statistical dimension

Matrix norm S Vector norm S

`1 Θ(kq log m1m2kq ) `1 Θ(k log p

k )

trace-norm Θ(m1 + m2) `2 p

`1 + trace-n. Ω(kq ∧ (m1 + m2)

)1 elastic net Θ(k log p

k )

(k , q)-trace O((k ∨ q) log (m1 ∨m2)) k-support Θ(k log pk )

(k, q)-cut O(k log m1k + q log m2

q ) κk Θ(k log pk )

“cut-norm” O(m1 + m2) `∞ p

1Proof relying on a result of Oymak et al. (2012)29/40

Page 64: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Algorithm

30/40

Page 65: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Working set algorithm

Given a working set S of blocks (I , J), solve the restricted problem

minZ , (A(IJ))(I ,J)∈S

L(Z ) + λ∑

(I ,J)∈S

∥∥A(IJ)∥∥∗

Z =∑

(I ,J)∈S

A(IJ) , supp(A(IJ)) ⊂ I×J.

Proposition

The global problem is solved by a solution ZS of the restricted problemif and only if

∀(I , J) ∈ Gk,q,∥∥∥[∇L(ZS)

]I ,J

∥∥∥op≤ λ. (?)

31/40

Page 66: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Working set algorithm

Active set algorithm

Iterate:

1 Solve the restricted problem2 Look for (I , J) that violates (?)

If none exists, terminate the algorithm !Else add the found (I , J) to S

Problem: step 2 require to solve a rank-1 SPCA problem → NP-hard

Idea: Leverage the work on algorithms that attempt to solverank-1 SPCA like

convex relaxations,

truncated power iteration method

to heuristically find blocks potentially violating the constraint.

32/40

Page 67: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Experiments

33/40

Page 68: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Denoising results

Z ∈ R1000×1000 with Z =∑r

i=1 aib>i + σG and aib

>i ∈ Ak,q

k = q

σ2 small ⇒ MSE ∝ S(ab>,Ωk,q) σ2

100

101

102

103

101

102

103

104

105

106

k

NM

SE

(k,k)−rank = 1

l1

TraceΩ

k,q

1 2 3 4 5 6100

200

300

400

500

600

700

800

900

(k,q)−rank

NM

SE

34/40

Page 69: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Denoising results

Z ∈ R1000×1000 with Z =∑r

i=1 aib>i + σG and aib

>i ∈ Ak,q

k = q

σ2 small ⇒ MSE ∝ S(ab>,Ωk,q) σ2

100

101

102

103

101

102

103

104

105

106

k

NM

SE

(k,k)−rank = 1

l1

TraceΩ

k,q

1 2 3 4 5 6100

200

300

400

500

600

700

800

900

(k,q)−rank

NM

SE

34/40

Page 70: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Denoising results [Z ∈ R300×300 and σ2 small ⇒ MSE ∝ S(ab>,Ωk,q) σ2]

100

101

102

103

101

102

103

104

105

106

k

NM

SE

No overlap

l1

TraceΩ

k,q

100

101

102

103

101

102

103

104

105

106

k

NM

SE

90 % overlap

l1

TraceΩ

k,q

35/40

Page 71: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Empirical results for sparse PCA

36/40

Page 72: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

Conclusions

Summary

Gain in statistical performance at the expense of theoreticaltractability.

Even though the problem is NP-hard the structure of the convexproblem can be exploited to devise efficient heuristics.

Not discussed

slow rate analysis

purely geometric results

Open questions and future work

Generalization to the case where (k , q) can be different for eachpair of factors and not known a priori.

37/40

Page 73: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

References I

Amelunxen, D., Lotz, M., McCoy, M. B., and Tropp, J. A. (2013). Living on the edge: Phasetransitions in convex programs with random data. Technical Report 1303.6672, arXiv.

Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefiniterelaxations for sparse principal components. Ann. Stat., 37(5B):2877–2921.

Argyriou, A., Foygel, R., and Srebro, N. (2012). Sparse prediction with the k-support norm.In Pereira, F., Burges, C. J. C., Bottou, L., and Weinberger, K. Q., editors, Adv. Neural.Inform. Process Syst., volume 25, pages 1457–1465. Curran Associates, Inc.

Bach, F. (2013). Convex relaxations of structured matrix factorizations. Technical Report1309.3117, arXiv.

Bach, F., Mairal, J., and Ponce, J. (2008). Convex sparse matrix factorizations. TechnicalReport 0812.1869, arXiv.

Chandrasekaran, V., Recht, B., Parrilo, P. A., and Willsky, A. S. (2012). The convexgeometry of linear inverse problems. Found. Comput. Math., 12(6):805–849.

d’Aspremont, A., Bach, F., and El Ghaoui, L. (2008). Optimal solutions for sparse principalcomponent analysis. J. Mach. Learn. Res., 9:1269–1294.

d’Aspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. G. (2007). A directformulation for sparse PCA using semidefinite programming. SIAM Review, 49(3):434–448.

Deza, M. M. and Laurent, M. (1997). Geometry of Cuts and Metrics, volume 15 ofAlgorithms and Combinatorics. Springer Berlin Heidelberg.

38/40

Page 74: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

References IIDoan, X. V. and Vavasis, S. A. (2013). Finding approximately rank-one submatrices with the

nuclear norm and `1 norms. SIAM J. Optimiz., 23(4):2502–2540.

Elad, M. and Aharon, M. (2006). Image denoising via sparse and redundant representationsover learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745.

Jameson, G. J. O. (1987). Summing and Nuclear Norms in Banach Space Theory. Number 8in London Mathematical Society Student Texts. Cambridge University Press.

Journee, M., Nesterov, Y., Richtarik, P., and Sepulchre, R. (2010). Generalized powermethod for sparse principal component analysis. J. Mach. Learn. Res., 11:517–553.

Luss, R. and Teboulle, M. (2013). Conditional gradient algorithms for rank-one matrixapproximations with a sparsity constraint. SIAM Rev., 55(1):65–98.

Oymak, S., Jalali, A., Fazel, M., Eldar, Y. C., and Hassibi, B. (2012). Simultaneouslystructured models with application to sparse and low-rank matrices. Technical Report1212.3753, arXiv.

Richard, E., Savalle, P.-A., and Vayatis, N. (2012). Estimation of simultaneously sparse andlow-rank matrices. In Proceedings of the 29th International Conference on MachineLearning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc /Omnipress.

Wang, Y.-X., Xu, H., and Leng, C. (2013). Provable subspace clustering: When LRR meetsSSC. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger,K. Q., editors, Adv. Neural. Inform. Process Syst., volume 26, pages 64–72. CurranAssociates, Inc.

39/40

Page 75: Relaxations convexes pour l'estimation de matrices à ...agarivie/Telecom/slidesMAS/Obozinski.pdf · Relaxations convexes pour l’estimation de ... Journ ees MAS, Toulouse, Aout^

References III

Witten, D. M., Tibshirani, R., and Hastie, T. (2009). A penalized matrix decomposition, withapplications to sparse principal components and canonical correlation analysis.Biostatistics, 10(3):515–534.

Yuan, X.-T. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems.J. Mach. Learn. Res., 14:889–925.

40/40

relaxations convexes pour l'estimation de matrices à...

Documents