hybrid nmf apsipa2014 invited
TRANSCRIPT
Hybrid Multichannel Signal Separation Using Supervised Nonnegative Matrix Factorization
Daichi Kitamura, (The University of Tokyo, Japan)
Hiroshi Saruwatari, (The University of Tokyo, Japan)
Satoshi Nakamura, (Nara Institute of Science and Technology, Japan)
Yu Takahashi, (Yamaha Corporation, Japan)
Kazunobu Kondo, (Yamaha Corporation, Japan)
Hirokazu Kameoka, (The University of Tokyo, Japan)
東京大学, YAMAHA
2
Outline• 1. Research background• 2. Conventional methods
– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF
• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method
• 4. Experiments– Closed data experiment– Open data experiment
• 5. Conclusions
3
Outline• 1. Research background• 2. Conventional methods
– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF
• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method
• 4. Experiments– Closed data experiment– Open data experiment
• 5. Conclusions
4
Research background• Signal separation have received much attention.
• Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area.
• Supervised NMF (SNMF) achieves the highest separation performance.
• To improve its performance, SNMF-based multichannel signal separation method is required.
• Automatic music transcription• 3D audio system, etc.
Applications
Separate!
Separate the target signal from multichannel signals with high accuracy.
5
Outline• 1. Research background• 2. Conventional methods
– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF
• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method
• 4. Experiments– Closed data experiment– Open data experiment
• 5. Conclusions
6
• NMF can extract significant spectral patterns.
– Basis matrix has frequently-appearing spectral patterns in .
NMF [Lee, et al., 2001]
Amplitude
Am
plitu
de
Observed matrix(spectrogram)
Basis matrix(spectral patterns)
Activation matrix(Time-varying gain)
Time
: Number of frequency bins: Number of time frames: Number of bases
Time
Freq
uenc
y
Freq
uenc
y
Basis
7
• SNMF – Supervised spectral separation method
Supervised NMF [Smaragdis, et al., 2007]
Separation process Optimize
Training process
Supervised basis matrix (spectral dictionary)
Sample sounds of target signal
Fixed
Sample sound
Target signal Other signalMixed signal
8
Problems of SNMF• SNMF is only for a single-channel signal
– For multichannel signal, SNMF cannot use information between channels.
• When many interference sources exist, separation performance of SNMF markedly degrades.
Separate
Residual components
9
• Multichannel NMF – is a natural extension of NMF for a multichannel signal– uses spatial information for the clustering of bases to
achieve the unsupervised separation task.
Multichannel NMF [Sawada, et al., 2013]
Problems: Multichannel NMF involve strong dependence on initial values and lack robustness.
Microphone array
10
Outline• 1. Research background• 2. Conventional methods
– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF
• 3. Proposed method– Motivation and strategy– SNMF with spectrogram restoration and its Hybrid method
• 4. Experiments– Closed data experiment– Open data experiment
• 5. Conclusions
11
• Sawada’s multichannel NMF– is unified method to solve spatial and spectral separations.– Maximizes a likelihood:
– For supervised situation, target spectral patterns is given.
– Too much difficult to solve (lack robustness)– Computationally inefficient (much computational time)
Motivation and strategy
Spatial direction of target signal
Source components of all signals
Target Other
Observed spectrograms
12
• Proposed hybrid method– divides the problems as follows:
– The spatial separation should be carried out with classical D.O.A. estimation methods.• These methods are very efficient and stable.
– Divide and conquer method
Motivation and strategy
Unsupervised spatial separation
Supervised spectral separation
Approximation
Classical D.O.A. estimation SNMF-based method
13
Directional clustering [Araki, et al., 2007]
• Directional clustering– Unsupervised spatial separation method– k-means clustering (fast and stable)
• Problems– Artificial distortion arises owing to the binary masking.
Right
L R
CenterLeft
L R
Center
Binary masking
Input signal (stereo) Separated signal
1
1
1
0
0
0
1
0
0
0
0
0
1
1 1 1
0
0
1
0
0
0
0
0
1 1 1
1 1
1
Freq
uenc
y
Time
C
C
C
R L R
C
L
L
L
R
R
C
C C C
R
R
C
R
R
L
L
L
C C C C C
C
Freq
uenc
y
Time
Binary maskSpectrogram
Entry-wise product
14
Proposed method: hybrid separation• Hybrid separation method
Input stereo signal
Spatial separation method (Directional clustering)
SNMF-based separation method(SNMF with spectrogram restoration)
Separated signal
L R
15
SNMF with spectrogram restoration
: Holes
Time
Freq
uenc
y
Separated cluster Spectral holes (lost components)
The proposed SNMF treats these holes as unseen observationsSupervised basis
…
Extrapolate the fittest bases
(dictionary of target signal)
Fix up
16
SNMF with spectrogram restoration
Center RightLeftDirection
sour
ce c
ompo
nent
z
(b)
Center RightLeftDirection
sour
ce c
ompo
nent (a)
Target
Center RightLeftDirection
sour
ce c
ompo
nent (c)
Extrapolated components
Freq
uenc
y of
Freq
uenc
y of
Freq
uenc
y of
After
Input
After
signal
directionalclustering
super-resolution-based SNMF
Binary masking
Time
Freq
uenc
yObserved spectrogram
Target
Interference
Time
Time
Freq
uenc
y
Extrapolate
Freq
uenc
y
Separated cluster
Reconstructed data
Supervised spectral bases
Directional clustering
SNMF with spectrogram restoration
17
• The divergence is defined at all grids except for the holes by using the Binary mask matrix .
Decomposition model and cost function
Decomposition model: Supervised bases (Fixed)
: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm
Cost function:
: Binary masking matrix obtained from directional clustering
18
• The divergence is defined at all grids except for the holes by using the Binary mask matrix .
Decomposition model and cost function
Decomposition model: Supervised bases (Fixed)
: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm
Cost function:
: Binary masking matrix obtained from directional clustering
Binary index to exclude the holes
19
• The divergence is defined at all grids except for the holes by using the Binary mask matrix .
Decomposition model and cost function
Decomposition model: Supervised bases (Fixed)
: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm
Regularization term
Cost function:
: Binary masking matrix obtained from directional clustering
Binary index to exclude the holes
20
• The divergence is defined at all grids except for the holes by using the Binary mask matrix .
Decomposition model and cost function
Decomposition model: Supervised bases (Fixed)
: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm
Regularization termPenalty term[Kitamura, et al. 2014]
Cost function:
: Binary masking matrix obtained from directional clustering
Binary index to exclude the holes
21
• : -divergence [Eguchi, et al., 2001]
– EUC-distance
– KL-divergence
– IS-divergence
Generalized divergence: b -divergence
The best criterion for signal separation [Kitamura, et al., 2014]
22
• We used two -divergences for the main cost and the regularization cost as and .
Decomposition model and cost function
Decomposition model:
Cost function: Supervised bases (Fixed)
23
Update rules• We can obtain the update rules for the optimization of
the variables matrices , , and .
Update rules:
24
Outline• 1. Research background• 2. Conventional methods
– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF
• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method
• 4. Experiments– Closed data experiment– Open data experiment
• 5. Conclusions
25
• Mixed signal includes four melodies (sources).• Three compositions of instruments
– We evaluated the average score of 36 patterns.
Experimental condition
Center
12 3
4
Left Right
Target source
Supervision signal
24 notes that cover all the notes in the target melody
Dataset Melody 1 Melody 2 Midrange BassNo. 1 Oboe Flute Piano TromboneNo. 2 Trumpet Violin Harpsichord FagottoNo. 3 Horn Clarinet Piano Cello
26
14121086420
SD
R [d
B]
43210bNMF
• Signal-to-distortion ratio (SDR)– total quality of the separation, which includes the degree of
separation and absence of artificial distortion.
Experimental result: closed data
Good
Bad
Conventional SNMF(single-channel SNMF)
Proposed hybrid method
Directional clustering
Supervised Multichannel NMF [Sawada]
KL-divergence EUC-distance
27
SNMF with spectrogram restoration• SNMF with spectrogram restoration has two tasks.
• The optimal divergence for source separation is KL-divergence ( ).
• In contrast, a divergence with higher value is suitable for the basis extrapolation.
Source separation
SNMF with spectrogram restoration
Basis extrapolation
28
Trade-off: separation and restoration• The optimal divergence for SNMF with spectrogram
restoration and its hybrid method is based on the trade-off between separation and restoration abilities.
-10-8-6-4-20
Am
plitu
de [d
B]
543210Frequency [kHz]
-10-8-6-4-20
Am
plitu
de [d
B]
543210Frequency [kHz]
Sparseness: strong Sparseness: weak
Per
form
ance
Separation
Total performance of the hybrid method
Restoration
0 1 2 3 4
29
• Closed data experiment– used different Tone generator for training and test signals
Experimental condition
Supervision signal
24 notes that cover all the notes in the target melody
Provided by Tone generator A
Provided by Tone generator B (more real sound)
+ back ground noise (SNR = 10 dB)
Center
12 3
4
Left Right
Target source
30
1086420-2-4
SD
R [d
B]
43210bNMF
• Signal-to-distortion ratio (SDR)– total quality of the separation, which includes the degree of
separation and absence of artificial distortion.
Experimental result: open data
Good
Bad
Conventional SNMF(single-channel SNMF)
Proposed hybrid method
Directional clustering
Supervised Multichannel NMF [Sawada]
KL-divergence EUC-distance
31
Conclusions• We proposed a hybrid multichannel signal separation
method combining directional clustering and SNMF with spectrogram restoration.
• There is a trade-off between separation and restoration abilities.
Thank you for your attention!
Demonstration is available!