nonnegative matrix factorization - complexity, algorithms ... · nonnegative matrix factorization,...

Post on 22-Sep-2020

52 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Nonnegative Matrix Factorization-

Complexity, Algorithms and Applications

Nicolas Gillis†

Advisor: Francois GlineurUniversite catholique de Louvain

†Universite de MonsDepartment of Mathematics and Operational Research

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 1

Belgium

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 2

Belgium

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 2

Nonnegative Matrix Factorization (NMF)Given a matrix M ∈ Rp×n

+ and a factorization rank r � min(p, n), find

U ∈ Rp×rand V ∈ Rr×n such that

minU≥0,V≥0

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

NMF is a linear dimensionality reduction technique for nonnegative data :

M(:, j)︸ ︷︷ ︸≥0

≈r∑

k=1

U(:, k)︸ ︷︷ ︸≥0

V (k, j)︸ ︷︷ ︸≥0

for all j.

Why nonnegativity?

→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, recommendersystems, community detection, clustering, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 3

Nonnegative Matrix Factorization (NMF)Given a matrix M ∈ Rp×n

+ and a factorization rank r � min(p, n), find

U ∈ Rp×rand V ∈ Rr×n such that

minU≥0,V≥0

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

NMF is a linear dimensionality reduction technique for nonnegative data :

M(:, j)︸ ︷︷ ︸≥0

≈r∑

k=1

U(:, k)︸ ︷︷ ︸≥0

V (k, j)︸ ︷︷ ︸≥0

for all j.

Why nonnegativity?

→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, recommendersystems, community detection, clustering, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 3

Nonnegative Matrix Factorization (NMF)Given a matrix M ∈ Rp×n

+ and a factorization rank r � min(p, n), find

U ∈ Rp×rand V ∈ Rr×n such that

minU≥0,V≥0

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

NMF is a linear dimensionality reduction technique for nonnegative data :

M(:, j)︸ ︷︷ ︸≥0

≈r∑

k=1

U(:, k)︸ ︷︷ ︸≥0

V (k, j)︸ ︷︷ ︸≥0

for all j.

Why nonnegativity?

→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, recommendersystems, community detection, clustering, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 3

Example: blind hyperspectral unmixing

Figure : Urban hyperspectral image with 162 spectral bands and 307-by-307pixels.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 4

Example: blind hyperspectral unmixing

� Basis elements allow to recover the different materials;

� Weights allow to know which pixel contains which material.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 5

Example: blind hyperspectral unmixing

� Basis elements allow to recover the different materials;

� Weights allow to know which pixel contains which material.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 5

Example: blind hyperspectral unmixing

� Basis elements allow to recover the different materials;

� Weights allow to know which pixel contains which material.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 5

Example: blind hyperspectral unmixing

Figure : Decomposition of the Urban dataset.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 6

Example: blind hyperspectral unmixing

Figure : Decomposition of the Urban dataset.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 6

Example: blind hyperspectral unmixing

Figure : Decomposition of the Urban dataset.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 6

Outline

1. Algorithms and Applications

I General framework for NMF algorithms

I Solving NMF sequentially with underapproximations

2. Complexity and Bounds

I Nonnegative rank

I rank-one subproblems, weights and missing data

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 7

How can we ‘solve’ NMF problems?Given a matrix M ∈ Rm×n

+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

0. Initialize (U , V ). Then, alternatively update U and V :1. Update V ≈ argminX≥0 ||M − UX||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − Y V ||2F . (NNLS)

HALS algorithm: Use block-coordinate descent on NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :

U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max

(0,

RkVTk:

||Vk:||22

)∀k,

where Rk.= M −

∑j 6=k U:jVj:, and similarly for V .

HALS can be accelerated ordering updates efficiently.

G., Glineur, Accelerated Multiplicative Updates and Hierarchical ALS Algorithms forNonnegative Matrix Factorization, Neural Computation 2012.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 8

How can we ‘solve’ NMF problems?Given a matrix M ∈ Rm×n

+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

0. Initialize (U , V ). Then, alternatively update U and V :1. Update V ≈ argminX≥0 ||M − UX||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − Y V ||2F . (NNLS)

HALS algorithm: Use block-coordinate descent on NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :

U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max

(0,

RkVTk:

||Vk:||22

)∀k,

where Rk.= M −

∑j 6=k U:jVj:, and similarly for V .

HALS can be accelerated ordering updates efficiently.

G., Glineur, Accelerated Multiplicative Updates and Hierarchical ALS Algorithms forNonnegative Matrix Factorization, Neural Computation 2012.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 8

How can we ‘solve’ NMF problems?Given a matrix M ∈ Rm×n

+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

0. Initialize (U , V ). Then, alternatively update U and V :1. Update V ≈ argminX≥0 ||M − UX||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − Y V ||2F . (NNLS)

HALS algorithm: Use block-coordinate descent on NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :

U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max

(0,

RkVTk:

||Vk:||22

)∀k,

where Rk.= M −

∑j 6=k U:jVj:, and similarly for V .

HALS can be accelerated ordering updates efficiently.

G., Glineur, Accelerated Multiplicative Updates and Hierarchical ALS Algorithms forNonnegative Matrix Factorization, Neural Computation 2012.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 8

Drawbacks of standard NMF Algorithms

1. The optimal solution is, in most cases, non-unique and the problem isill-posed. Many variants of NMF impose additional constraints (e.g.,sparsity, smoothness, spatial information, etc.).

2. In practice, it is difficult to choose the factorization rank (in general,trial and error approach or estimation using the SVD).

A possible way to overcome these drawbacks is to use underapproximationconstraints to solve NMF sequentially.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 9

Drawbacks of standard NMF Algorithms

1. The optimal solution is, in most cases, non-unique and the problem isill-posed. Many variants of NMF impose additional constraints (e.g.,sparsity, smoothness, spatial information, etc.).

2. In practice, it is difficult to choose the factorization rank (in general,trial and error approach or estimation using the SVD).

A possible way to overcome these drawbacks is to use underapproximationconstraints to solve NMF sequentially.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 9

Drawbacks of standard NMF Algorithms

1. The optimal solution is, in most cases, non-unique and the problem isill-posed. Many variants of NMF impose additional constraints (e.g.,sparsity, smoothness, spatial information, etc.).

2. In practice, it is difficult to choose the factorization rank (in general,trial and error approach or estimation using the SVD).

A possible way to overcome these drawbacks is to use underapproximationconstraints to solve NMF sequentially.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 9

Nonnegative Matrix Underapproximation (NMU)It is possible to solve NMF sequentially, solving at each step

minu≥0,v≥0

||M − uvT ||2F such that uvT ≤M ⇐⇒ M − uvT ≥ 0.

NMU is yet another linear dimensionality reduction technique.However,

� As PCA/SVD, it is sequential and is well-posed.

� As NMF, it leads to a separation by parts. Moreover theadditional underapproximation constraints enhance this property.

� In the presence of pure-pixels, the NMU recursion is able todetect materials individually.

G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, PatternRecognition, 2010.G., Plemmons, Dimensionality Reduction, Classification, and Spectral Mixture Analysis usingNonnegative Underapproximation, Optical Engineering, 2011.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 10

Nonnegative Matrix Underapproximation (NMU)It is possible to solve NMF sequentially, solving at each step

minu≥0,v≥0

||M − uvT ||2F such that uvT ≤M ⇐⇒ M − uvT ≥ 0.

NMU is yet another linear dimensionality reduction technique.However,

� As PCA/SVD, it is sequential and is well-posed.

� As NMF, it leads to a separation by parts. Moreover theadditional underapproximation constraints enhance this property.

� In the presence of pure-pixels, the NMU recursion is able todetect materials individually.

G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, PatternRecognition, 2010.G., Plemmons, Dimensionality Reduction, Classification, and Spectral Mixture Analysis usingNonnegative Underapproximation, Optical Engineering, 2011.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 10

Nonnegative Matrix Underapproximation (NMU)It is possible to solve NMF sequentially, solving at each step

minu≥0,v≥0

||M − uvT ||2F such that uvT ≤M ⇐⇒ M − uvT ≥ 0.

NMU is yet another linear dimensionality reduction technique.However,

� As PCA/SVD, it is sequential and is well-posed.

� As NMF, it leads to a separation by parts. Moreover theadditional underapproximation constraints enhance this property.

� In the presence of pure-pixels, the NMU recursion is able todetect materials individually.

G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, PatternRecognition, 2010.G., Plemmons, Dimensionality Reduction, Classification, and Spectral Mixture Analysis usingNonnegative Underapproximation, Optical Engineering, 2011.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 10

Nonnegative Matrix Underapproximation (NMU)It is possible to solve NMF sequentially, solving at each step

minu≥0,v≥0

||M − uvT ||2F such that uvT ≤M ⇐⇒ M − uvT ≥ 0.

NMU is yet another linear dimensionality reduction technique.However,

� As PCA/SVD, it is sequential and is well-posed.

� As NMF, it leads to a separation by parts. Moreover theadditional underapproximation constraints enhance this property.

� In the presence of pure-pixels, the NMU recursion is able todetect materials individually.

G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, PatternRecognition, 2010.G., Plemmons, Dimensionality Reduction, Classification, and Spectral Mixture Analysis usingNonnegative Underapproximation, Optical Engineering, 2011.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 10

NMU on the Urban dataset

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 11

NMU on the Urban dataset

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 11

NMU on the Urban dataset

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 11

NMU on the Urban dataset

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 11

NMU on the Urban dataset

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 11

Identifying Lighting Orientations with NMUA static scene is illuminated from many directions.

NMU is able to detect the different lighting orientations.

NMU has also been successfully used in text mining for anomaly detection,and in image processing for segmentation of medical images.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 12

Identifying Lighting Orientations with NMUA static scene is illuminated from many directions.

NMU is able to detect the different lighting orientations.

NMU has also been successfully used in text mining for anomaly detection,and in image processing for segmentation of medical images.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 12

Identifying Lighting Orientations with NMUA static scene is illuminated from many directions.

NMU is able to detect the different lighting orientations.

NMU has also been successfully used in text mining for anomaly detection,and in image processing for segmentation of medical images.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 12

Outline

1. Algorithms and Applications

I General framework for NMF algorithms

I Solving NMF sequentially with underapproximations

2. Complexity and Bounds

I Nonnegative rank

I Rank-one subproblems, weights and missing data

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 13

What is the nonnegative rank?

The nonnegative rank of a nonnegative matrix M ∈ Rm×n+ is the minimum

number r of nonnegative rank-one factors needed to reconstruct M :

M =

r∑i=1

uivTi , ui ∈ Rm

+ , vi ∈ Rn+,

that is, the minimum r such that an exact NMF exists:

M = UV =

r∑i=1

U:iVi:, U ≥ 0, V ≥ 0.

The nonnegative rank of M is denoted rank+(M). Clearly,

rank(M) ≤ rank+(M) ≤ min(m,n).

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 14

What is the nonnegative rank?

The nonnegative rank of a nonnegative matrix M ∈ Rm×n+ is the minimum

number r of nonnegative rank-one factors needed to reconstruct M :

M =

r∑i=1

uivTi , ui ∈ Rm

+ , vi ∈ Rn+,

that is, the minimum r such that an exact NMF exists:

M = UV =

r∑i=1

U:iVi:, U ≥ 0, V ≥ 0.

The nonnegative rank of M is denoted rank+(M). Clearly,

rank(M) ≤ rank+(M) ≤ min(m,n).

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 14

An amazing resultLet P be a polytope

P = {x ∈ Rk | bi −A(i, :)x ≥ 0 for 1 ≤ i ≤ m},

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i, j) = bi −A(i, :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P . The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Remark. Other closely related problems in communication complexity,probability, graph theory.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 15

An amazing resultLet P be a polytope

P = {x ∈ Rk | bi −A(i, :)x ≥ 0 for 1 ≤ i ≤ m},

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i, j) = bi −A(i, :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P . The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Remark. Other closely related problems in communication complexity,probability, graph theory.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 15

An amazing resultLet P be a polytope

P = {x ∈ Rk | bi −A(i, :)x ≥ 0 for 1 ≤ i ≤ m},

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i, j) = bi −A(i, :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P . The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Remark. Other closely related problems in communication complexity,probability, graph theory.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 15

An amazing resultLet P be a polytope

P = {x ∈ Rk | bi −A(i, :)x ≥ 0 for 1 ≤ i ≤ m},

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i, j) = bi −A(i, :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P . The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Remark. Other closely related problems in communication complexity,probability, graph theory.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 15

An amazing resultLet P be a polytope

P = {x ∈ Rk | bi −A(i, :)x ≥ 0 for 1 ≤ i ≤ m},

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i, j) = bi −A(i, :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P . The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Remark. Other closely related problems in communication complexity,probability, graph theory.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 15

The Hexagon

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 16

The Hexagon

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

=

1 0 0 1/2 00 1 0 1 00 0 1 1/2 00 0 1 0 1/20 1 0 0 11 0 0 0 1/2

0 1 2 1 0 00 0 1 0 0 11 0 0 0 1 20 0 0 2 2 02 2 0 0 0 0

,

with

rank(SP) = 3 ≤ rank+(SP) = 5 ≤ min(m,n) = 6.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 16

Lower Bounds for the Nonnegative RankIt is difficult to compute the nonnegative rank (NP-hard). However, it ispossible to compute lower bounds efficiently. For example,

n = # vertices(P) ≤ 2r+ .

Goemans, Smallest compact formulation for the permutahedron, ISMP 2009.Beasley and Laffey, Real rank versus nonnegative rank, LAA 2009.G., Glineur, On the geometric interpretation of the nonnegative rank, LAA 2012.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 17

What is the complexity of NMF?

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

� For r = 1: ‘easy’ [Perron-Frobenius and Eckart-Young theorems].

� rank(M) = 2: rank+(M) = 2, ‘easy’.

� r part of the input: NP-hard (Vavasis, 2009).

� rank+(M) = r not part of the input: polynomial –O((mn)r

2)

(Arora et al. 2012, Moitra 2013).

� rank(M) = k not part of the input: open problem.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 18

What is the complexity of NMF?

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

� For r = 1: ‘easy’ [Perron-Frobenius and Eckart-Young theorems].

� rank(M) = 2: rank+(M) = 2, ‘easy’.

� r part of the input: NP-hard (Vavasis, 2009).

� rank+(M) = r not part of the input: polynomial –O((mn)r

2)

(Arora et al. 2012, Moitra 2013).

� rank(M) = k not part of the input: open problem.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 18

What is the complexity of NMF?

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

� For r = 1: ‘easy’ [Perron-Frobenius and Eckart-Young theorems].

� rank(M) = 2: rank+(M) = 2, ‘easy’.

� r part of the input: NP-hard (Vavasis, 2009).

� rank+(M) = r not part of the input: polynomial –O((mn)r

2)

(Arora et al. 2012, Moitra 2013).

� rank(M) = k not part of the input: open problem.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 18

What is the complexity of NMF?

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V ∈Rr×n+

||M − UV ||2F =∑i,j

(M − UV )2ij . (NMF)

� For r = 1: ‘easy’ [Perron-Frobenius and Eckart-Young theorems].

� rank(M) = 2: rank+(M) = 2, ‘easy’.

� r part of the input: NP-hard (Vavasis, 2009).

� rank+(M) = r not part of the input: polynomial –O((mn)r

2)

(Arora et al. 2012, Moitra 2013).

� rank(M) = k not part of the input: open problem.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 18

What about rank-one problems in NMF?

Solving NMF requires to finding r nonnegative rank-one factors U:kVk:,each having to satisfy the following equality as well as possible

U:kVk: ≈M −∑j 6=k

U:jVj:.= Rk � 0 ∀k.

These subproblems have the following form

minu∈Rm,v∈Rn

||R− uvT ||2F such that u ≥ 0, v ≥ 0. (R1NF)

called rank-one nonnegative factorization.

Is this problem difficult?If R ≥ 0 : No.If R � 0 : Surprisingly, yes.

G., Glineur, A Continuous Characterization of the Maximum-Edge Biclique Problem, JOGO ’12.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 19

What about rank-one problems in NMF?

Solving NMF requires to finding r nonnegative rank-one factors U:kVk:,each having to satisfy the following equality as well as possible

U:kVk: ≈M −∑j 6=k

U:jVj:.= Rk � 0 ∀k.

These subproblems have the following form

minu∈Rm,v∈Rn

||R− uvT ||2F such that u ≥ 0, v ≥ 0. (R1NF)

called rank-one nonnegative factorization.

Is this problem difficult?If R ≥ 0 : No.If R � 0 : Surprisingly, yes.

G., Glineur, A Continuous Characterization of the Maximum-Edge Biclique Problem, JOGO ’12.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 19

What about rank-one problems in NMF?

Solving NMF requires to finding r nonnegative rank-one factors U:kVk:,each having to satisfy the following equality as well as possible

U:kVk: ≈M −∑j 6=k

U:jVj:.= Rk � 0 ∀k.

These subproblems have the following form

minu∈Rm,v∈Rn

||R− uvT ||2F such that u ≥ 0, v ≥ 0. (R1NF)

called rank-one nonnegative factorization.

Is this problem difficult?If R ≥ 0 : No.If R � 0 : Surprisingly, yes.

G., Glineur, A Continuous Characterization of the Maximum-Edge Biclique Problem, JOGO ’12.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 19

What about rank-one problems in NMF?

Solving NMF requires to finding r nonnegative rank-one factors U:kVk:,each having to satisfy the following equality as well as possible

U:kVk: ≈M −∑j 6=k

U:jVj:.= Rk � 0 ∀k.

These subproblems have the following form

minu∈Rm,v∈Rn

||R− uvT ||2F such that u ≥ 0, v ≥ 0. (R1NF)

called rank-one nonnegative factorization.

Is this problem difficult?If R ≥ 0 : No.If R � 0 : Surprisingly, yes.

G., Glineur, A Continuous Characterization of the Maximum-Edge Biclique Problem, JOGO ’12.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 19

What about weights and missing data?

In some cases, some entries are missing/unknown.

For example, we would like to predict how much someone is going to like amovie based on its movie preferences (e.g., 1 to 5 stars) :

Users

Movies

2 3 2 ? ?? 1 ? 3 21 ? 4 1 ?5 4 ? 3 2? 1 2 ? 41 ? 3 4 3

Huge potential in electronic commerce sites (movies, books, music, . . . ).Good recommendations will increase the propensity of a purchase.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 20

What about weights and missing data?

In some cases, some entries are missing/unknown.

For example, we would like to predict how much someone is going to like amovie based on its movie preferences (e.g., 1 to 5 stars) :

Users

Movies

2 3 2 ? ?? 1 ? 3 21 ? 4 1 ?5 4 ? 3 2? 1 2 ? 41 ? 3 4 3

Huge potential in electronic commerce sites (movies, books, music, . . . ).Good recommendations will increase the propensity of a purchase.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 20

Low-rank model for recommendation systems

The behavior of users is modeled using linear combination of ’feature’users (related to age, sex, culture, etc.)

M(:, j)︸ ︷︷ ︸user j

≈r∑

k=1

U(:, k)︸ ︷︷ ︸feature user k

V (k, j)︸ ︷︷ ︸weights

Or equivalently, movies ratings are modeled as linear combinations of’feature’ movies (related to the genres - child oriented, serious vs. escapist,thriller, romantic, actors, etc.).

M(i, :)︸ ︷︷ ︸movie i

≈r∑

k=1

U(i, k)︸ ︷︷ ︸weights

V (k, :)︸ ︷︷ ︸genre k

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 21

Low-rank model for recommendation systems

The behavior of users is modeled using linear combination of ’feature’users (related to age, sex, culture, etc.)

M(:, j)︸ ︷︷ ︸user j

≈r∑

k=1

U(:, k)︸ ︷︷ ︸feature user k

V (k, j)︸ ︷︷ ︸weights

Or equivalently, movies ratings are modeled as linear combinations of’feature’ movies (related to the genres - child oriented, serious vs. escapist,thriller, romantic, actors, etc.).

M(i, :)︸ ︷︷ ︸movie i

≈r∑

k=1

U(i, k)︸ ︷︷ ︸weights

V (k, :)︸ ︷︷ ︸genre k

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 21

Example

M =

2 3 2 ? ?? 1 ? 3 21 ? 4 1 ?5 4 ? 3 2? 1 2 ? 41 ? 3 4 3

0.5 0.6 −0.10.8 −0.2 −0.30.8 −0.7 0.6−2 2.3 1.8−0.2 0.3 0.91 −0.2 −0.2

1.7 2.1 3.7 5 4.1

2.2 3.2 0.8 5 0.52 0.6 2.6 0.9 5

= UV

=

2 2.9 2.1 5.4 1.90.3 0.9 2 2.7 1.71 −0.2 4 1 5.95.3 4.2 −0.9 3.1 22.1 1.1 1.8 1.3 2.80.9 1.3 3 3.8 3

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 22

Example

M =

2 3 2 ? ?? 1 ? 3 21 ? 4 1 ?5 4 ? 3 2? 1 2 ? 41 ? 3 4 3

0.5 0.6 −0.10.8 −0.2 −0.30.8 −0.7 0.6−2 2.3 1.8−0.2 0.3 0.91 −0.2 −0.2

1.7 2.1 3.7 5 4.1

2.2 3.2 0.8 5 0.52 0.6 2.6 0.9 5

= UV

=

2 2.9 2.1 5.4 1.90.3 0.9 2 2.7 1.71 −0.2 4 1 5.95.3 4.2 −0.9 3.1 22.1 1.1 1.8 1.3 2.80.9 1.3 3 3.8 3

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 22

Example

M =

2 3 2 ? ?? 1 ? 3 21 ? 4 1 ?5 4 ? 3 2? 1 2 ? 41 ? 3 4 3

0.5 0.6 −0.10.8 −0.2 −0.30.8 −0.7 0.6−2 2.3 1.8−0.2 0.3 0.91 −0.2 −0.2

1.7 2.1 3.7 5 4.1

2.2 3.2 0.8 5 0.52 0.6 2.6 0.9 5

= UV

=

2 2.9 2.1 5.4 1.90.3 0.9 2 2.7 1.71 −0.2 4 1 5.95.3 4.2 −0.9 3.1 22.1 1.1 1.8 1.3 2.80.9 1.3 3 3.8 3

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 22

ComplexityWeighted low-rank approximation is the following problem

infU ∈ Rm×r

V ∈ Rr×n

||M − UV T ||2W =∑ij

Wij(M − UV T )2ij , (WLRA)

where W ≥ 0 is the weighting matrix. A zero in W represents amissing/unknown entry in M .For r = 1 and M ≥ 0, this is equivalent to Weighted NMF.

TheoremIt is NP-hard to find an approximate solution of WLRA, for any r ≥ 1.

Other applications: computer vision, microarray data analysis, 2-Ddigital filter, etc.

G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIMAX ’11.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 23

ComplexityWeighted low-rank approximation is the following problem

infU ∈ Rm×r

V ∈ Rr×n

||M − UV T ||2W =∑ij

Wij(M − UV T )2ij , (WLRA)

where W ≥ 0 is the weighting matrix. A zero in W represents amissing/unknown entry in M .For r = 1 and M ≥ 0, this is equivalent to Weighted NMF.

TheoremIt is NP-hard to find an approximate solution of WLRA, for any r ≥ 1.

Other applications: computer vision, microarray data analysis, 2-Ddigital filter, etc.

G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIMAX ’11.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 23

ComplexityWeighted low-rank approximation is the following problem

infU ∈ Rm×r

V ∈ Rr×n

||M − UV T ||2W =∑ij

Wij(M − UV T )2ij , (WLRA)

where W ≥ 0 is the weighting matrix. A zero in W represents amissing/unknown entry in M .For r = 1 and M ≥ 0, this is equivalent to Weighted NMF.

TheoremIt is NP-hard to find an approximate solution of WLRA, for any r ≥ 1.

Other applications: computer vision, microarray data analysis, 2-Ddigital filter, etc.

G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIMAX ’11.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 23

Maximum-edge biclique problemThe reductions for NMU, R1NF and WLRA are from themaximum-edge biclique problem: Given two sets of objects interactingtogether (a bipartite graph), find highly connected subgroups.

This is the maximum-edge complete bipartite subgraph.

Applications in clustering : text mining, web community discovery . . .

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 24

Maximum-edge biclique problemThe reductions for NMU, R1NF and WLRA are from themaximum-edge biclique problem: Given two sets of objects interactingtogether (a bipartite graph), find highly connected subgroups.

This is the maximum-edge complete bipartite subgraph.

Applications in clustering : text mining, web community discovery . . .

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 24

Maximum-edge biclique problemThe reductions for NMU, R1NF and WLRA are from themaximum-edge biclique problem: Given two sets of objects interactingtogether (a bipartite graph), find highly connected subgroups.

This is the maximum-edge complete bipartite subgraph.

Applications in clustering : text mining, web community discovery . . .

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 24

ExampleLet M be the biadjacency matrix of the bipartite graph representing theinteractions:

M =

1 0 10 1 11 0 1

=

1 00 11 0

( 1 0 10 1 1

)

=

1 0 10 0 01 0 1

+

0 0 00 1 10 0 0

Each rank-one factor corresponds to a community.

[W11] Wang et al., Community discovery using nonnegative matrix factorization, Data Min.Knowl. Disc. 22:493-521, 2011.[PRE11] Psorakis, Roberts, and Ebden, Overlapping community detection using Bayesiannon-negative matrix factorization, Physical Review E 83, 066114 (2011).[YL13] Yang and Leskovec, Overlapping Community Detection at Scale: A Nonnegative MatrixFactorization Approach, Proc. of the sixth ACM int. conf. on web search and data mining, 2013.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 25

Conclusion

Low-rank matrix approximation can be used for linear dimensionalityreduction. It is a powerful tool for data analysis.

Nonnegativity renders the problem difficult.

However, it enhances significantly its applicability in many areas (byimproving interpretability), e.g., image processing, text mining,hyperspectral unmixing, clustering, community detection, andrecommender systems.

Still many open questions for NMF and related problems, e.g., subclass ofmatrices for which NMF can be computed efficiently (separable NMF),bounding and computing nonnegative ranks, non-uniquenesscharacterization, complexity when using other norms, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 26

Conclusion

Low-rank matrix approximation can be used for linear dimensionalityreduction. It is a powerful tool for data analysis.

Nonnegativity renders the problem difficult.

However, it enhances significantly its applicability in many areas (byimproving interpretability), e.g., image processing, text mining,hyperspectral unmixing, clustering, community detection, andrecommender systems.

Still many open questions for NMF and related problems, e.g., subclass ofmatrices for which NMF can be computed efficiently (separable NMF),bounding and computing nonnegative ranks, non-uniquenesscharacterization, complexity when using other norms, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 26

Conclusion

Low-rank matrix approximation can be used for linear dimensionalityreduction. It is a powerful tool for data analysis.

Nonnegativity renders the problem difficult.

However, it enhances significantly its applicability in many areas (byimproving interpretability), e.g., image processing, text mining,hyperspectral unmixing, clustering, community detection, andrecommender systems.

Still many open questions for NMF and related problems, e.g., subclass ofmatrices for which NMF can be computed efficiently (separable NMF),bounding and computing nonnegative ranks, non-uniquenesscharacterization, complexity when using other norms, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 26

Conclusion

Low-rank matrix approximation can be used for linear dimensionalityreduction. It is a powerful tool for data analysis.

Nonnegativity renders the problem difficult.

However, it enhances significantly its applicability in many areas (byimproving interpretability), e.g., image processing, text mining,hyperspectral unmixing, clustering, community detection, andrecommender systems.

Still many open questions for NMF and related problems, e.g., subclass ofmatrices for which NMF can be computed efficiently (separable NMF),bounding and computing nonnegative ranks, non-uniquenesscharacterization, complexity when using other norms, etc.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 26

Thank you for your attention!

Code and papers available onhttps://sites.google.com/site/nicolasgillis/

Recent Survey: ‘The Why and How of Nonnegative Matrix Factorization’,arXiv:1401.5226.

Householder XIX Nonnegative Matrix Factorization: Complexity, Algorithms and Applications 27

top related