improvednoisepowerspectraldensitytracking byamap … · 2016. 11. 8. ·...
TRANSCRIPT
Improved noise power spectral density trackingby a MAP-based postprocessor
Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach
University of Paderborn, Germany
March 28th, 2012
Computer Science, ElectricalEngineering and Mathematics
Communications EngineeringProf. Dr.-Ing. Reinhold Hab-Umbach
��
Table of Contents
1 Introduction and problem formulation
2 MAP-based noise PSD estimation
3 Experimental framework
4 Performance evaluation
5 Summary and outlook
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 1 / 13
��
Introduction
yt
Motivation
• Noise PSD estimation is a key component
◮ to speech enhancement
◮ to robust automatic speechrecognition
Basic assumptions
• Many sophisticated algorithms rely on two assumptions:
◮ Noise ’more stationary’ than speech
◮ Noise-only time-frequency bins at regular intervals
Here
• MAP-based (MAP-B) postprocessor
◮ Estimate of noise power even if speech is dominant in time-frequency bin
◮ Initial estimate of the current speech power required
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 2 / 13
��
MAP-B as a postprocessor
σ2N,k,l
σ2X,k,l
x(I)t
σ2N,k,l
STDFT
yt
nt - noise signal
xt - clean speech signal
frequency bin index
frame index
Noiseestimation
Yk,l
a prioriSNR
a prioriSNR
X(I)k,l
X(II)k,l
x(II)t
Gain Gain
MAP-B
ISTDFT ISTDFT
first speech enhancement stage
second speech enhancement stage
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 3 / 13
��
Problem formulation
Observed
Yl = Nl + Xl
• We consider Nl as target process ’corrupted’ by clean speech Xl of knownvariance σ2
X,l
Our goal
• Estimation of noise variance σ2N,l+1 at frame l+ 1, from the noisy observation
yl+1 given
◮ the current speech power σ2X,l+1
◮ a priori PDF pσ2N(σ2) of variance σ2
N,l
Approach
• Maximum A Posteriori (MAP) estimation
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 4 / 13
��
MAP-based noise PSD estimation
Bayesian variance estimation: Yl = Nl
If σ2X,l+1 = 0 (uncorrupted observ.) and σ2
N,l+1 = σ2N
(stationary) then the textbook
problem:
• Scaled inverse chi-square distribution
pσ2N(σ2) ∝ (σ2)−
νl+2
2 · e−
νlλ2l
2σ2
with the degrees of freedom νl and the scale factor λ2lis conjugate prior to normal
observation PDF
pYl+1|σ
2N(yl+1|σ
2) =1
πσ2· e−
|yl+1|2
σ2
• Parameter update νl+1 = νl + 2 and λ2l+1 = 2
νl+2|yl+1|
2 + νlνl+2
λ2l
• MAP-estimate of variance
σ2N,l+1 = argmax
σ2
[
pσ2N|Yl+1
(σ2|yl+1)]
=νl+1
νl+1 + 2· λ2
l+1
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 5 / 13
��
Extension to non-stationary noise
Still ’uncorrupted’ noise: Yl = Nl
If σ2N,l+1 is time-variant then:
• The parameter νl is kept at a constant value νl+1 = νl = ν0
• This results in recursive smoothing of variance estimate
σ2N,l+1 = (1 − α) · σ2
N,l + α · |yl+1|2, where α =
2
ν0 + 4
Choice of ν0
• Trade-off between tracking ability and estimation error in stationary noise
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 6 / 13
��
General case
Observation ’corrupted’ by speech: Yl = Nl + Xl
• If σ2X,l+1 6= 0 and is known then the posterior PDF
pσ2N|Yl+1
(σ2|yl+1) ∝ (σ2X,l+1 + σ2)−1 · (σ2)−
νl+2
2 · e−
(
|yl+1|2
σ2X,l+1
+σ2+
νlλ2l
2σ2
)
,
is no longer a conjugate prior for the observation PDF.
In order to maintain an efficient MAP estimation procedure we
• approximate the posterior PDF by a scaled inverse chi-squared distribution,
• and match its maximum (σ2N,l+1) with the maximum of the posterior PDF,
• which we calculate efficiently using a bisection and Newton approach.
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 7 / 13
��
Experimental framework
MAP-B as postprocessor
• First noise PSD estimator: Improved Minima Controlled Recursive Averaging(IMCRA) algorithm [Cohen, 2003]
• Gain function: Optimally-Modified Log-Spectral Amplitude (OM-LSA) estimator[Cohen, Berdugo, 2001]
Yk,l+1
IMCRAσ2N,k,l+1 a priori
a priori
SNR
SNR
σ2X,k,l+1
MAP-Bσ2N,k,l+1
OM-LSAOM-LSA
X(I)k,l+1 X
(II)k,l+1
first speech enhancement stage
second speech enhancement stage
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 8 / 13
��
Performance evaluation
Setup
• Clean speech: TIMIT database, sentences concatenated to 3 minutes length
• Artificially added noise from Noisex92 database:
◮ Noise types: ’Stationary WGN’, ’Triangular WGN’, ’Babble’ and ’Factory-1’
◮ SNR values: −5 , 0 , 5 , 10 , 15 dB
• MAP-B estimator: we set ν0 = 40 corresponding to a time constant of ≃ 0.164 s
Reference noise PSD
• Recursive temporal smoothing
σ2N,k,l = 0.95 · σ2
N,k,l−1 + 0.05 · |Nk,l|2, with known noise periodogram |Nk,l|
2
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 9 / 13
��
Sample trajectories of noise variance estimates
• ’Babbble’ noise at frequency bink = 97 (3 kHz)
Time [s]
[dB]
4 5 6 7 8
50
56
62
• MAP-B: continious update ofnoise variance estimate
• ’Triangular WGN’ (averaged overall frequency bins)
Time [s][dB]
4 5 6 7 8
64
66
68
reference
IMCRA
MAP-B
• MAP-B: faster response to risingnoise power
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 10 / 13
��
Quantitative evaluation
• Performance measures adopted from [Taghia et al., 2011]
• Fig.(a): minimum averaged logdistance LEm between thereference and estimated PSD
• MAP-B obtains lower errorLEm for all noise types andSNRs less than or equal to5 dB or 10 dB
• Fig.(b): variance of thelogarithmic difference LEv
• MAP-B yields lower varianceLEv for all noise types andSNRs than the IMCRA
IMCR
A
IMCR
A
IMCR
A
IMCR
A
MAP-B
MAP-B
MAP-B
MAP-B
(a)
(b)
0
1
2
3
0
1
2
3
4
Stationary Triangular Babble Factory-1WGNWGN
LE
mLE
v
-5 dB 0 dB 5 dB 10 dB 15 dB
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 11 / 13
��
Speech enhancement
• Gain in perceptual speech quality (PESQ) scores
PESQGain = PESQMAP-B − PESQIMCRA
Fema
le
Fema
le
Fema
le
Fema
leMaleMaleMaleMale
0.0
0.1Stationary Triangular Babble Factory-1
WGNWGN
PESQ
Gain
-5 dB 0 dB 5 dB 10 dB 15 dB
• MAP-B has a favourable effect on speech quality for non-stationary noise types
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 12 / 13
��
Summary and outlook
Summary
• Proposed MAP-B estimator is able to track the noise statistics even if the speechis dominant
• Low computational complexity
• Single parameter ν0
• Experimental evaluation: MAP-B obtains
◮ lower estimation error under low SNR conditions
◮ lower fluctuation of the estimated values under all tested environments
◮ slightly improved speech quality for non-stationary noise types
Outlook
• Investigations about dependence of MAP-B algorithm performance:
◮ on the first noise PSD estimator
◮ on the degrees of freedom ν0
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
��
Thank you for your attention!
Questions?
Computer Science, ElectricalEngineering and Mathematics
Communications EngineeringProf. Dr.-Ing. Reinhold Hab-Umbach
��
Periodograms of cleen, noisy and enhanced signals
• Periodogramms of the spokensentence ’Biblical scholarsargue history’ for ’factory-1’noise at an SNR of 5 dB(a) cleen speech signal
(b) noisy speech signal
(c) enhanced speech signal basedon IMCRA estimates
(d) enhanced speech signal basedon MAP-B estimates
• MAP-B has a positive influenceon periodogramm of enhancedsignal particularly clearly seenin the highlighted window
File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_mapb57.wav Page: 1 of 1 Printed: Mo Mär 19 18:47:36
1
2
3
4
5
6
7
kHz
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
time
File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_noisy57.wav Page: 1 of 2 Printed: Mo Mär 19 18:46:06
1
2
3
4
5
6
7
kHz
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
time
File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_or57.wav Page: 1 of 1 Printed: Mo Mär 19 18:44:19
1
2
3
4
5
6
7
kHz
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
time
File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_imcra57.wav Page: 1 of 2 Printed: Mo Mär 19 18:47:00
1
2
3
4
5
6
7
kHz
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
time
(a)
(b)
(c)
(d)
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
��
Minimum Statistics (MS) instead of IMCRA
• Performance measures LEm and LEv for female speaker signals
MSMSMSMSMAP
-B
MAP-B
MAP-B
MAP-B
(a)
(b)
0
3
6
9
0
2
4
Stationary Triangular Babble Factory-1WGNWGN
LE
mLE
v
-5 dB 0 dB 5 dB 10 dB 15 dB
• Fig.(a): MAP-Bobtains lowerestimation error LEm
than the MS for allnoise types and SNRs
• Fig.(b): MAP-B yieldslower variance LEv
than the MS for alltested setups
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
��
Minimum Statistics (MS) instead of IMCRA
• PESQGain = PESQMAP-B − PESQNoiseEst
with ’NoiseEst’ ∈ [’IMCRA’; ’MS’] for female speaker signals
IMCR
A
IMCR
A
IMCR
A
IMCR
AMSMSMSMS
0.0
0.1Stationary Triangular Babble Factory-1
WGNWGN
PESQ
Gain
-5 dB 0 dB 5 dB 10 dB 15 dB
• A favourable effect of MAP-B estimator on speech quality for non-stationarynoise types is for MS smaller than for IMCRA
Improved noise power spectral density tracking by a MAP-based postprocessor
A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
��