efficient initialization for nonnegative matrix factorization based on nonnegative independent...

19
Daichi Kitamura (SOKENDAI, Japan) Nobutaka Ono (NII/SOKENDAI, Japan) Efficient initialization for NMF based on nonnegative ICA IWAENC 2016, Sept. 16, 08:30 - 10:30, Session SPS-II - Student paper competition 2 SPC-II- 04

Upload: daichi-kitamura

Post on 11-Jan-2017

216 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

Daichi Kitamura (SOKENDAI, Japan)Nobutaka Ono (NII/SOKENDAI, Japan)

Efficient initialization for NMF based on nonnegative ICA

IWAENC 2016, Sept. 16, 08:30 - 10:30, Session SPS-II - Student paper competition 2

SPC-II-04

Page 2: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Nonnegative matrix factorization (NMF) [Lee, 1999]

– Dimensionality reduction with nonnegative constraint– Unsupervised learning extracting meaningful features– Sparse decomposition (implicitly)

Research background: what is NMF?

Amplitude

Amplitu

de

Input data matrix(power spectrogram)

Basis matrix(spectral patterns)

Activation matrix(time-varying gains)

Time

Time

Freq

uency

Freq

uency

2/19

: # of rows: # of columns: # of bases

Page 3: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Optimization in NMF– Define a cost function (data fidelity) and minimize it

– No closed-form solution for and– Efficient iterative optimization

• Multiplicative update rules (auxiliary function technique) [Lee, 2001]

– Initial values for all the variables are required.

Research background: how to optimize?

3/19

(when the cost function is a squared Euclidian distance)

Page 4: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Results of all applications using NMF always depend the initialization of and .– Ex. source separation via full-supervised NMF [Smaragdis, 2007]

• Motivation: Initialization method that always gives us a good performance is desired.

Problem and motivation

4/19

12

10

8

6

4

2

0SD

R im

prov

emen

t [dB

]

Ran

d10

Ran

d1

Ran

d2

Ran

d3

Ran

d4

Ran

d5

Ran

d6

Ran

d7

Ran

d8

Ran

d9

Different random seeds

More than 1 dB

Poor

Good

Page 5: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• With random values (not focused here)

– Directly use random values– Search good values via genetic algorithm [Stadlthanner, 2006], [Janecek, 2011]

– Clustering-based initialization [Zheng, 2007], [Xue, 2008], [Rezaei, 2011]

• Cluster input data into clusters, and set the centroid vectors to initial basis vectors.

• Without random values– PCA-based initialization [Zhao, 2014]

• Apply PCA to input data , extract orthogonal bases and coefficients, and set their absolute values to the initial bases and activations.

– SVD-based initialization [Boutsidis, 2008]

• Apply a special SVD (nonnegative double SVD) to input data and set nonnegative left and right singular vectors to the initial values.

Conventional NMF initialization techniques

5/19

Page 6: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Are orthogonal bases really better for NMF?– PCA and SVD are orthogonal decompositions.– A geometric interpretation of NMF [Donoho, 2003]

• The optimal bases in NMF are “along the edges of a convex cone” that includes all the observed data points.

– Orthogonality might not be a good initial value for NMF.

Bases orthogonality?

6/19

Convex cone

Data points

Edge

Optimal bases Orthogonal bases Tight bases

satisfactory for representing all the data points

have a risk to represent a meaningless area

cannot represent all the data points

Meaningless areas

Page 7: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• What can we do from only the input data ?– Independent component analysis (ICA) [Comon, 1994]

– ICA extracts non-orthogonal bases • that maximize a statistical independence between sources.

– ICA estimates sparse sources• when we assume a super-Gaussian prior.

• Propose to use ICA bases and estimated sources as initial NMF values– Objectives:

• 1. Deeper minimization• 2. Faster convergence• 3. Better performance

Proposed method: utilization of ICA

7/19Number of update iterations in NMF

Valu

e of

cos

t fu

nctio

n in

NM

F

Deeper minimization

Faster convergence

Page 8: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• The input data matrix is a mixture of some sources.– sources in are mixed via , then observed as

– ICA can estimate a demixing matrix and the independent sources .

• PCA for only the dimensionality reduction in NMF • Nonnegative ICA for taking nonnegativity into account• Nonnegativization for ensuring complete nonnegativity

Proposed method: concept

8/19

Input data matrix Mixing matrix Source matrix

… …

Input data matrix

PCANMFInitial valuesNICA Nonnegativization

…ICA bases

PCA matrix for dimensionality reduction

Mutually independent

Page 9: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Nonnegative ICA (NICA) [Plumbley, 2003]

– estimates demixing matrix so that all of the separated sources become nonnegative.

– finds rotation matrix for pre-whitened mixtures .

– Steepest gradient descent for estimating

Nonnegative constrained ICA

9/19

Cost function: where

ObservedWhitening w/o

centering

Pre-whitened SeparatedRotation

(demixing)

Page 10: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Dimensionality reduction via PCA

• NMF variables obtained from the estimates of NICA– Support that ,– then we have

Combine PCA for dimensionality reduction

10/19

Rows are eigenvectors of has top- eigenvectors

Eig

enva

lues

High

Low

Basis matrix

Activation matrix

Rotation matrix estimated by NICA

ICA bases Sources

Zero matrix

Page 11: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Even if we use NICA, there is no guarantee that– obtained (sources) becomes completely nonnegative

because of the dimensionality reduction by PCA.– As for the obtained basis (ICA bases), nonnegativity is

not assumed in NICA.• Take a “nonnegativization” for obtained and :

– Method 1: – Method 2: – Method 3:

• where and are scale fitting coefficient that depend on a divergence of following NMF

Nonnegativization

11/19

Correlation between and

Correlation between and

Page 12: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Power spectrogram of mixture with Vo. and Gt.– Song: “Actions – One Minute Smile” from SiSEC2015– Size of power spectrogram: 2049 x 1290 (60 sec.)– Number of bases:

Experiment: conditions

12/19

Freq

uenc

y [k

Hz]

Time [s]

Page 13: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Convergence of cost function in NICA

Experiment: results of NICA

13/19

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Val

ue o

f cos

t fun

ctio

n in

NIC

A

2000150010005000Number of iterations

Steepest gradient descent

Page 14: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Convergence of EU-NMF

Experiment: results of Euclidian NMF

14/19

Processing time for initialization

NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s

EU-NMF: 12.78 s (for 1000 iter.)

Rand1~10 are based on random initialization with different seeds.

5

6

7

8

9

1010

Cos

t fun

ctio

n in

EU

-NM

F

10008006004002000Number of iterations

NICA1 NICA2 NICA3 PCA-based initialization NNDSVD Rand1~Rand10

SVDPCA

Rand1~Rand10

Proposed methods

Page 15: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Convergence of KL-NMF

8

9

107

Cos

t fun

ctio

n in

KL-

NM

F

10008006004002000Number of iterations

NICA1 NICA2 NICA3 PCA-based initialization NNDSVD Rand1~Rand10

Experiment: results of Kullback-Leibler NMF

15/19

SVD

PCA

Rand1~Rand10

Proposed methods

Processing time for initialization

NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s

KL-NMF: 48.07 s (for 1000 iter.)

Rand1~10 are based on random initialization with different seeds.

Page 16: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Convergence of IS-NMF1.70

1.65

1.60

1.55

1.50

1.45

Cos

t fun

ctio

n in

IS-N

MF

10008006004002000Number of iterations

NICA1 NICA2 NICA3 PCA-based initialization NNDSVD Rand1~Rand10

Experiment: results of Itakura-Saito NMF

16/19

SVDPCA

Rand1~Rand10

Proposed methods

x106

Processing time for initialization

NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s

IS-NMF: 214.26 s (for 1000 iter.)

Rand1~10 are based on random initialization with different seeds.

Page 17: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

Experiment: full-supervised source separation• Full-supervised NMF [Smaragdis, 2007]

– Simply use pre-trained sourcewise bases for separation

17/19

Training stage

,

Separation stage

Initialized by conventional or proposed method

Cost functions:

Cost function:

Pre-trained bases (fixed)

Initialized based on the correlations between and or

Page 18: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

• Two sources separation using full-supervised NMF– SiSEC2015 MUS dataset (professionally recorded music) – Averaged SDR improvements of 15 songs

Experiment: results of separation

18/19

Separation performance for source 1 Separation performance for source 2

Ran

d10

NIC

A1

NIC

A2

NIC

A3

PC

AS

VD

Ran

d1R

and2

Ran

d3R

and4

Ran

d5R

and6

Ran

d7R

and8

Ran

d9

12

10

8

6

4

2

0

SD

R im

prov

emen

t [dB

]

5

4

3

2

1

0

SD

R im

prov

emen

t [dB

]

Ran

d10

NIC

A1

NIC

A2

NIC

A3

PC

AS

VD

Ran

d1R

and2

Ran

d3R

and4

Ran

d5R

and6

Ran

d7R

and8

Ran

d9

Prop. Conv. Prop. Conv.

Page 19: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

Conclusion• Proposed efficient initialization method for NMF• Utilize statistical independence for obtaining non-

orthogonal bases and sources– The orthogonality may not be preferable for NMF.

• The proposed initialization gives – deeper minimization– faster convergence– better performance for full-supervised source separation

19/19

Thank you for your attention!