spectral noise tracking for improved nonstationary noise ...tableofcontents 1 introduction 2...
TRANSCRIPT
11. ITG Fachtagung Sprachkommunikation
Spectral Noise Trackingfor Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach
Department of Communications EngineeringUniversity of Paderborn, Germany
September 24th, 2014
Computer Science, ElectricalEngineering and Mathematics
Communications EngineeringProf. Dr.-Ing. Reinhold Hab-Umbach
NT
Table of Contents
1 Introduction
2 A-priori model for nonstationary noise features in a Bayesianfeature enhancement
3 Maximum a-posteriori based spectral noise tracking
4 Noise model transfer approach
5 Experimental results on the Aurora IV database
6 Summary and outlook
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 1 / 10
NT
Introduction
yt
FachgebietNachrichten-
technik
Noise Robust ASR
• Type of distortion - an additivenonstationary noise nt from Aurora IVdatabase
• ASR method - a causal Bayesian FeatureEnhancement, [Leutnant et al., 2011]
Nonstationary Noise Robust Automatic Speech Recognition
xt
nt
yt MFCC
Feature
Extraction
Bayesian Feature Enhancement Decoder
A-priori A-priori Clean
Noise Model Speech Model
Observation
Model
yt xt w
• Until now: a time-invariant a-priori noise model p(nt) = N (nt;µn,Σn)
◮ Estimate µn,Σn on the speech-free frames in the beginning of an utterance
• New: p(nt) = N (nt;µn,t,Σn) with a time-variant mean vector µn,t
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 2 / 10
NT
A-priori model of nonstationary noise for a Bayesian framework
Feature Extraction
A-priori Noise Model
yt
Power Spectrum LMPSC MFCC
Noise
Short-timePower
Spectrum
Log-Mel
Filterbank
ModelTransfer
DCT
Bayesian
Feature
Enhancement
|Yt|2 yt yt
xt
σ2
N,t nt µn,t µn,t
N (µ,Σ)
p(nt)
NoisePSD
Estimator
Stepwise calculation of the a-priori noise model p(nt) = N (nt;µn,t,Σn)
1. Estimate a time-variant power spectral density (PSD) σ2N,t
= E[|Nt|2] of the
nonstationary noise signal from a noisy power spectrum |Yt|2
2. Calculate a time-variant mean vector µn,t from estimates nt in the LMPSCdomain by using the Noise Model Transfer approach
3. Estimate the time-invariant covariance matrix Σn for the Gaussian noise model
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 3 / 10
NT
Maximum a-posteriori (MAP) based spectral noise tracking
Signal processing in the first stage
• Time-invariant noise PSDestimator (CONST PSD) based onthe speech-free frames in thebeginning of an utterance
• Decision-directed approach forestimation of the a-priori SNR ξt,[Ephraim et al., 1984]
First stage|Yt|
2
σ2
N,tCONST
PSD
Decision-directedApproach
ξt
Speech PSD calculationA-priori SNR smoothing
σ2
X,t ζtσ2
N,tMAP-based Postprocessor
Noise PSD Estimator for one frequency bin
MAP-based (MAP-B) noise PSD tracker as a postprocessor
• Given the noisy power |Yt|2, the clean speech PSD σ2X,t
and the a-priori SNR ζt
calculate a MAP-B noise PSD estimate σ2N,t
, [Chinaev et al., 2012]
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 4 / 10
NT
Maximum A-Posteriori Based (MAP-B) noise PSD tracker
Noise PSD estimation even in presence of speech signal
• For calculation of the MAP-B estimate σ2N,t
model observations Yt by the
Gaussian distribution p(Yt|σ2N,t
) and the noise PSD σ2N,t
by the scaled inverse
chi-squared (SICS) distribution p(σ2N,t
)
|Yt|2
σ2
X,t
Observation model
p(Yt|σ2
N,t)
N (Yt; σ2
X,t+σ2
N,t)
Posteriori model
p(σ2
N,t|Yt)
A-priori model
p(σ2
N,t)
SICS(σ2
N,t-1;λ2
t-1)
Bisection/Newton
Mode(σ2
N,t|Yt)
Approxim. posterio by
SICS(σ2
N,t; λ2
t )
t→t-1
ζt
Compensation of
SNR dependent bias
MAP-B noise PSD tracker is a core component of the Noise PSD Estimator for one frequency bin
σ2
N,t
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 5 / 10
NT
An example for improved noise tracking in power spectral domain
Frame number
ln(P
SD)
Reference
MAP-B
Noisy PSD
CONST-PSD
Noise PSD tracking averaged over all frequency bins for one utterance with babble noise
0 200 400 600
• Noise PSD estimates of the MAP-B postprocessor aim to follow the Reference
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 6 / 10
NT
Noise Model Transfer (NMT) approach
Correction of the estimates nt in the LMPSC domain
• Assumed a bias model µn,t = nt + b estimate a time-invariant bias vector b foreach utterance by using the EM approach, [Yoshioka et al., 2013]
σ2
N,t nt
yt
µn,t
Log-Mel
Filterbank
Noise
Model
Transfer
DCT
LMPSC
A-priori Noise Model
nt
yt
µn,t
One EM iteration of the Noise Model Transfer
Bias modelµn,t = nt + bi-1
Nonlinearity
Clean speech
GMMπk ; µx,k
Σx,k
Calculate
Calculate
usingp(yt)
componentaffiliations
E-step γtknoise posteriop(nt|yt, k)
M-step
update
bias vectorbi
i=i+1
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 7 / 10
NT
Noise tracking in the LMPSC domain on the Aurora IV database
MSE
0
1
air bab car res str tra
CONST Online-NMT Offline-NMT Opt-NMT
0.624 0.509 0.405 0.236
Figure: Averaged mean squared error (MSE) values of different noise LMPSC estimates µn,t
• Online-NMT: vector b is calculated based on the previous data yt for t ∈ [1; t]
• Offline-NMT: conventional NMT approach based on data of the hole utterance
• Opt-NMT: using the true bias vector btrue
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 8 / 10
NT
Recognition results on the Aurora IV database
Baseline
CONST
Online-NMT
Offline-NMT
Opt-NMT
Oracel
clean 12.7 13.0 12.0 12.1 12.9 12.2airport 61.5 51.9 47.3 47.4 44.5 29.3babble 60.6 47.0 42.6 43.0 42.0 32.0car 39.0 19.5 17.1 16.9 17.6 15.9
restaurant 58.8 52.7 51.9 50.5 46.9 34.8street 58.2 43.5 43.9 42.1 41.4 30.9train 60.6 43.0 43.0 42.6 42.0 33.7
AVG 50.2 38.7 36.8 36.4 35.3 27.0
Table: Resulting word error rates on the Aurora IV database
• Improved nonstationary noise tracking leads to a consistent decrease ofthe averaged (AVG) word error rates
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 9 / 10
NT
Summary and outlook
Summary
• A-priori model for nonstationary noise
◮ Spectral noise tracking by using the MAP-B postprocessor
◮ Transformation of the noise PSD estimates into the MFCC domain by usingthe Noise Model Transfer approach
◮ Bayesian feature enhancement with the time-variant a-priori noise model
• Experimental results on the Aurora IV database
◮ Improved tracking of nonstationary noise features
◮ Consistent decrease of word error rates ⇒ nonstationary noise robust ASR
Outlook
• Better start values for the EM approach by considering the mismatch function
• Additional usage of a time-variant covariance matrix Σn → Σn,t
Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR
Aleksej Chinaev, Marc Puels, Reinhold Haeb-Umbach 10 / 10
NT
Thank you for your attention!
Questions?
Dipl.-Ing. Aleksej Chinaev
University of PaderbornDepartment of CommunicationsEngineering
Computer Science, ElectricalEngineering and Mathematics
Communications EngineeringProf. Dr.-Ing. Reinhold Hab-Umbach
NT