regularized superresolution-based binaural signal separation with nonnegative matrix factorization
TRANSCRIPT
![Page 1: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/1.jpg)
Regularized Superresolution-Based
Binaural Signal Separation
with Nonnegative Matrix Factorization
Daichi Kitamura, Hiroshi Saruwatari,
Yusuke Iwao, Kiyohiro Shikano
(Nara Institute of Science and Technology, Nara, Japan)
Kazunobu Kondo, Yu Takahashi
(Yamaha Corporation Research & Development Center, Shizuoka, Japan)
![Page 2: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/2.jpg)
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
2
![Page 3: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/3.jpg)
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
3
![Page 4: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/4.jpg)
Background
• Music signal separation technologies have received much
attention.
• Music signal separation based on nonnegative matrix
factorization (NMF) has been a very active area of the
research.
• The extraction performance of NMF markedly degrades for the
case of many source mixtures.
4
• Automatic music transcription• 3D audio system, etc.
Applications
We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments.
![Page 5: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/5.jpg)
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
5
![Page 6: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/6.jpg)
NMF
• NMF is a type of sparse representation algorithm that
decomposes a nonnegative matrix into two nonnegative
matrices. [D. D. Lee, et al., 2001]
6
Time
Freq
uen
cy
AmplitudeFr
equ
ency
Am
plit
ud
e
Observed matrix(Spectrogram)
Basis matrix(Spectral bases)
Activation matrix(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of frames
𝐾: Number of bases
𝒀: Observed matrix
𝑭: Basis matrix
𝑮: Activation matrix
![Page 7: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/7.jpg)
Penalized Supervised NMF (PSNMF)
• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
7
Separation process Fix trained bases and update .
is forced to become uncorrelated with
Update
Training process
Supervised bases
of the target sound
Supervision sound
![Page 8: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/8.jpg)
Penalized Supervised NMF (PSNMF)
• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
8
Separation process Fix trained bases and update .
is forced to become uncorrelated with
Update
Training process
Supervised bases
of the target sound
Supervision sound
Problem of PSNMF: When the signal includes many sources,
the extraction performance markedly degrades.
![Page 9: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/9.jpg)
Directional Clustering
• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
• This method can separate sources with spatial information in
an observed signal.
9
L R L-c
hin
pu
t sig
na
l
R-ch input signal
:Source component
:Centroid vector
![Page 10: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/10.jpg)
Directional Clustering
• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
• This method can separate sources with spatial information in
an observed signal.
10
L R L-c
hin
pu
t sig
na
l
R-ch input signal
:Source component
:Centroid vector
Problem of directional clustering:
This method cannot separate sources in the same direction.
![Page 11: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/11.jpg)
Hybrid method
• Conventional hybrid method utilizes PSNMF after the
directional clustering. [Iwao, et al., 2012]
• This method consists of two techniques.
– Directional clustering
– PSNMF
11
Directional
clusteringL R PSNMF
Spatial
separation
Source
separation
Conventional Hybrid method
![Page 12: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/12.jpg)
Problem of hybrid method
• The signal extracted by the hybrid method suffers from the
generation of considerable distortion due to the binary
masking in directional clustering.
• The signal in the target direction, which is obtained by
directional clustering, has many spectral chasms.
• The resolution of the spectrogram is degraded.
12
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Fre
qu
en
cy
: Target direction Time
Fre
qu
en
cy
TimeF
req
ue
ncy
: Other direction :Hadamard product (product of each element)
Input spectrogram Binary mask Separated cluster
Directional Clustering
![Page 13: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/13.jpg)
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
13
![Page 14: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/14.jpg)
Proposed hybrid method
14
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-ch
center cluster
Index of
based SNMF
Superresolution-
based SNMF
Superresolution-
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Conventional
hybrid method
Proposed
hybrid method
Employ a new supervised NMF algorithm as an alternative
to the conventional PSNMF in the hybrid method.
![Page 15: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/15.jpg)
Regularized superresolution-based NMF
• In proposed supervised NMF, the spectral chasms are treated
as unseen observations using index matrix.
15
: Chasms
Time
Fre
qu
en
cy
Separated clusterChasms
Treat chasms as
unseen observations.
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Fre
qu
en
cy
Index matrix
![Page 16: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/16.jpg)
Regularized superresolution-based NMF
• The spectrogram of the target sound is reconstructed using
more matched bases because chasms are treated as unseen.
• The components of the target sound lost after directional
clustering can be extrapolated using supervised bases.
16
Time
Fre
qu
en
cy
Separated cluster
Time
Fre
qu
en
cy
Reconstructed spectrogram: Chasms
Supervised
bases
Superresolution
using supervised
bases
![Page 17: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/17.jpg)
17
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
(a)
Freq
ue
ncy
of
Observedspectra
Target source
![Page 18: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/18.jpg)
18
Target direction
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
z
(b)
Freq
ue
ncy
of
Afterdirectionalclustering
Target source
Center RightLeftDirection
sou
rce
com
po
nen
t
(a)
Freq
ue
ncy
of
Observedspectra
Center sources lose some
of their components
Directional
clustering
![Page 19: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/19.jpg)
19
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
z
(b)
Freq
ue
ncy
of
Afterdirectionalclustering Center sources lose some
of their components
![Page 20: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/20.jpg)
20
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
z
(b)
Freq
ue
ncy
of
Afterdirectionalclustering Center sources lose some
of their components
Superresolution-
based NMF
Center RightLeftDirection
sou
rce
com
po
nen
t
(c)
Freq
ue
ncy
of
Aftersuper-resolution-based SNMF
Extrapolated
target source
![Page 21: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/21.jpg)
Regularized superresolution-based NMF
• The basis extrapolation includes an underlying problem.
• If the time-frequency spectra are almost unseen in the
spectrogram, which means that the indexes are almost zero, a
large extrapolation error may occur.
• It is necessary to regularize the extrapolation.
21
4
3
2
1
0
F
requency [
kH
z]
43210 Time [s]
Extrapolation error
(incorrectly modifying the activation)
Time
Fre
quency
Separated cluster
Almost unseen frame
![Page 22: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/22.jpg)
Regularized superresolution-based NMF
• We propose two types of regularizations.
22
Regularization of the temporal continuity
Regularization of the norm minimization
𝑰 : Index matrix ∙ : Binary complement
𝑖𝜔,𝑡: Entry of index matrix 𝑰 𝑔𝑘,𝑡: Entry of matrix 𝑮𝑓𝜔,𝑘: Entry of matrix 𝑭
Previous
frame
The intensity of these regularizations are proportional to the
number of chasms in each frame.
![Page 23: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/23.jpg)
Regularized superresolution-based NMF
• The cost function in regularized superresolution-based NMF is
defined using the index matrix as
23
: Regularization term
: Penalty term to force and to
become uncorrelated with each other
: Weighting parameter
![Page 24: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/24.jpg)
Regularized superresolution-based NMF
• The update rules that minimize the cost function are obtained
as follows:
24
![Page 25: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/25.jpg)
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
25
![Page 26: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/26.jpg)
Evaluation experiment
• We compared four methods.
– Conventional hybrid method using PSNMF (Conventional method)
– Proposed hybrid method using superresolution-based NMF without
regularization (Proposed method 1)
– Proposed hybrid method using superresolution-based NMF with
regularization of the temporal continuity (Proposed method 2)
– Proposed hybrid method using superresolution-based NMF with
regularization of the norm minimization (Proposed method 3)
26
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-ch
center clusterIndex of
based SNMFSuperresolution-
based SNMFSuperresolution-
ISTFT ISTFT
Mixing
Extracted signal
![Page 27: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/27.jpg)
Evaluation experiment
• We used stereo-panning signals ( ) and binaural-
recorded signals ( ) containing four instruments, Ob.,
Fl., Tb., and Pf., generated by MIDI synthesizer.
• The sources are mixed as the same power.
• Target source is always located in the center direction (no.1).
• We used the same type of MIDI sounds of the target
instruments as supervision for training process.
27
Center
12 3
4
Left Right
Target source
Supervision
sound
Two octave notes that cover all notes of the target signal
![Page 28: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/28.jpg)
Experimental results (panning signal)• Average SDR, SIR, and SAR scores for each method, where the 4
instruments are shuffled with 12 combinations.
28
12
10
8
6
4
2
0
SD
R [dB
]
24
20
16
12
8
4
0
SIR
[dB
]
10
8
6
4
2
0
SA
R [dB
]
SDR :quality of the separated target sound
SIR :degree of separation between the target and other sounds
SAR :absence of artificial distortion
Proposed method 1 :no regularization
Proposed method 2 :regularization of temporal continuity
Proposed method 3 :regularization of norm minimization
SDR SIR SARGood
Bad
![Page 29: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/29.jpg)
Experimental results (binaural signal)• Average SDR, SIR, and SAR scores for each method, where the 4
instruments are shuffled with 12 combinations.
29
6
5
4
3
2
1
0
SA
R [dB
]
20
16
12
8
4
0
SIR
[dB
]
10
8
6
4
2
0
SD
R [dB
]
SDR :quality of the separated target sound
SIR :degree of separation between the target and other sounds
SAR :absence of artificial distortion
SDR SIR SAR
Proposed method 1 :no regularization
Proposed method 2 :regularization of temporal continuity
Proposed method 3 :regularization of norm minimization
Bad
Good
![Page 30: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a692331a28ab514d8b46ac/html5/thumbnails/30.jpg)
Conclusions
• We propose a new supervised NMF algorithm, which is
superresolution-based method, for the hybrid method to
separate stereo or binaural signals.
• The proposed hybrid method can separate the target signal
with high performance compared with conventional method.
• The regularization of norm minimization is effective for the
proposed supervised NMF algorithm.
30
Thank you for your attention!