robust music signal separation based on supervised nonnegative matrix factorization with prevention...

30
Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano Nara Institute of Science and Technology, Japan Yu Takahashi, Kazunobu Kondo Yamaha Corporation, Japan IEEE International Symposium on Signal Processing and Information Technology December 12-15, 2013 - Athens, Greece Session T.B3: Speech – Audio - Music

Upload: daichi-kitamura

Post on 07-Aug-2015

198 views

Category:

Engineering


5 download

TRANSCRIPT

  1. 1. Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano Nara Institute of Science and Technology, Japan Yu Takahashi, Kazunobu Kondo Yamaha Corporation, Japan IEEE International Symposium on Signal Processing and Information Technology December 12-15, 2013 - Athens, Greece Session T.B3: Speech Audio - Music
  2. 2. Outline 1. Research background 2. Conventional method Nonnegative matrix factorization Supervised nonnegative matrix factorization Problem of conventional method: basis sharing 3. Proposed method Penalized supervised nonnegative matrix factorization Orthogonality penalty Maximum-divergence penalty 4. Experiments Two-source case Four-source case 5. Conclusions 2
  3. 3. Outline 1. Research background 2. Conventional method Nonnegative matrix factorization Supervised nonnegative matrix factorization Problem of conventional method: basis sharing 3. Proposed method Penalized supervised nonnegative matrix factorization Orthogonality penalty Maximum-divergence penalty 4. Experiments Two-source case Four-source case 5. Conclusions 3
  4. 4. Sound signal separation decomposes target source from an observed mixed signal. Speech and noise, specific instrumental sound, etc. Typical method for sound signal separation is treated in the time-frequency domain. Background Extract! Time Frequency Spectrogram First tone Second tone Separation 4
  5. 5. Outline 1. Research background 2. Conventional method Nonnegative matrix factorization Supervised nonnegative matrix factorization Problem of conventional method: basis sharing 3. Proposed method Penalized supervised nonnegative matrix factorization Orthogonality penalty Maximum-divergence penalty 4. Experiments Two-source case Four-source case 5. Conclusions 5
  6. 6. Nonnegative matrix factorization (NMF) is a sparse representation algorithm. can extract significant features from the observed matrix. It is difficult to cluster the bases as specific sources. Nonnegative matrix factorization [Lee, et al., 2012] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time : Number of frequency bins : Number of time frames : Number of bases Time Frequency Frequency 6 Basis
  7. 7. SNMF utilizes some sample sounds of the target. Construct the trained basis matrix of the target sound. Decompose into the target signal and other signal. Supervised NMF (SNMF) [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 7Fixed Ex. Musical scale Target signal Other signalMixed signal
  8. 8. Problem of SNMF Basis sharing problem in SNMF There is no constraint between and . Other bases may also have the target spectral patterns. The estimated target signal loses some of the target signal. The cost function is only defined as the distance between 8 Estimated target signal Estimated other signals Target signal If also have the target basis and .
  9. 9. Basis sharing problem: example of SNMF 9 Separated by SNMF Mixed signal Only the target signal (oracle)
  10. 10. Basis sharing problem: example of SNMF 10 Only the target signal (oracle) Separated by SNMF Mixed signal
  11. 11. Basis sharing problem: example of SNMF 11 Separated by SNMF Separated signal (estimated) The estimated signal loses some of the target components because of the basis sharing problem.
  12. 12. Outline 1. Research background 2. Conventional method Nonnegative matrix factorization Supervised nonnegative matrix factorization Problem of conventional method: basis sharing 3. Proposed method Penalized supervised nonnegative matrix factorization Orthogonality penalty Maximum-divergence penalty 4. Experiments Two-source case Four-source case 5. Conclusions 12
  13. 13. Proposed method In SNMF, other basis matrix may have the same spectral patterns with supervised basis matrix . Propose to make as different as possible from by introducing a penalty term in the cost function. 13 Target signal Other signalMixed signal Fixed Optimize as different as possible from . Basis sharing problem Penalized SNMF (PSNMF)
  14. 14. Decomposition model and cost function 14 Decomposition model: Cost function in SNMF: Generalized divergence function: -divergence [Eguchi, et al., 2001] Supervised basis matrix (fixed)
  15. 15. Decomposition model and cost function 15 Introduce a penalty term We propose two types of penalty terms. Cost function in PSNMF: Decomposition model: Cost function in SNMF: Supervised basis matrix (fixed)
  16. 16. Orthogonality penalty Orthogonality penalty is the optimization of that minimizes the inner product of matrices and . If includes the similar basis to , becomes larger. All the bases are normalized as one. Introduce a weighting parameter . 16
  17. 17. Maximum-divergence penalty Maximum-divergence penalty is the optimization of If includes the similar basis to , the divergence becomes smaller. All the bases are normalized as one. Introduce a weighting parameter and sensitivity parameter . 17 that maximizes the divergence between and .
  18. 18. Derivation of optimal variables in PSNMF Derive the optimal variables . Auxiliary function method Optimization scheme that uses the upper bound function. Design the auxiliary function for and as and . Minimize the original cost functions by minimizing the auxiliary functions indirectly. 18
  19. 19. Derivation of optimal variables in PSNMF The second and third terms become convex or concave function w.r.t. value. Convex: Jensens inequality Concave: tangent line inequality 19 where
  20. 20. Derivation of optimal variables in PSNMF Always becomes the convex function Convex: Jensens inequality 20 : auxiliary variable
  21. 21. Derivation of optimal variables in PSNMF Auxiliary functions and are designed as The update rules for optimization are obtained by 21 , and .
  22. 22. Update rules for optimization of PSNMF Update rules with orthogonality penalty 22 where,
  23. 23. Update rules for optimization of PSNMF Update rules with maximum-divergence penalty 23 where,
  24. 24. Outline 1. Research background 2. Conventional method Nonnegative matrix factorization Supervised nonnegative matrix factorization Problem of conventional method: basis sharing 3. Proposed method Penalized supervised nonnegative matrix factorization Orthogonality penalty Maximum-divergence penalty 4. Experiments Two-source case Four-source case 5. Conclusions 24
  25. 25. Produced four melodies using a MIDI synthesizer. Used the same MIDI sounds of the target instruments containing two octave notes as a supervision sound. Evaluation in two-source case and four-source case. There are 12 combinations in the two-source case, and 4 patterns in the four-source case. Experimental conditions 25 Training sound Two octave notes that cover all the notes of the target signal.
  26. 26. Evaluation scores [Vincent, 2006] Source-to-distortion ratio (SDR) SDR indicates the total quality of separated signal. Experimental conditions Observed signal Mixed 2 or 4 signals as the same power Training signal The same MIDI sounds of the target signal containing two octave notes Divergence criteria All combinations of Number of bases Supervised bases : 100 Other bases : 50 Parameters Experimentally determined Methods Conventional SNMF, Proposed PSNMF 26
  27. 27. 0 2 4 6 8 10 12 14 16 SDR[dB] 0 2 4 6 8 10 12 14 16 SDR[dB] 0 2 4 6 8 10 12 14 16 SDR[dB] Average scores of 12 combinations Conventional SNMF cannot achieve high separation accuracy because of the basis sharing problem. Proposed method outperforms conventional SNMF. Experimental results: two-source-case 27 Conv. SNMF PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) 0 1 2 0 1 2 0 1 2 Conv. SNMF Conv. SNMF
  28. 28. Average scores of 4 combinations PSNMF outperforms the conventional method. 0 2 4 6 8 10 12 14 SDR[dB] 0 2 4 6 8 10 12 14 SDR[dB] 0 2 4 6 8 10 12 14 SDR[dB] Experimental results: four-source-case 28 PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) 0 1 2 0 1 2 0 1 2 Conv. SNMF Conv. SNMF Conv. SNMF
  29. 29. Example of separation (Cello & Oboe) 29 Separated by SNMF Cello signal Mixed signal Separated by PSNMF (Ortho.)
  30. 30. Conclusions Conventional supervised NMF has a basis sharing problem that degrades the separation performance. We propose to add a penalty term, which forces the other bases to become uncorrelated with supervised bases, in the cost function. Penalized supervised NMF can achieve the high separation accuracy. 30 Penalized supervised NMF Thank you for your attention!