robust speech recognition
TRANSCRIPT
-
8/2/2019 Robust Speech Recognition
1/17
Robust Speech
recognition
-
8/2/2019 Robust Speech Recognition
2/17
Mismatch Between Training and
Testing
mismatch influences scores
causes of mismatch
Speech Variation
Inter-Speaker Variation
-
8/2/2019 Robust Speech Recognition
3/17
Robust Approaches
three categoriesnoise resistant features (Speech var.)
speech enhancement (Speech var. + Inter-speaker var.)
model adaptation for noise (Speech var. + Inter-speaker var.)
Recognition system
testing
trainingModels
Features
encoding
Word sequence
Spk. A
Spk. B
-
8/2/2019 Robust Speech Recognition
4/17
Contents
Overview
Noise resistant features
Speach enhancementModel adaptation
Stochastic Matching
Our current work
-
8/2/2019 Robust Speech Recognition
5/17
Noise resistant features
Acoustic representation Emphasis on less affected evidences
Auditory systems inspired models
Filter banks, Loudness curve, Lateral inhibition
Slow variation removal Cepstrum Mean Normalization, Time derivatives
Linear Discriminative Analysis Searches for the best parameterization
-
8/2/2019 Robust Speech Recognition
6/17
Speech enhancement
Parameter mapping stereo data
observation subspace
Bayesian estimation stochastic modelization of speech and noise
Template based estimation restriction to a subspace
output is noise free various templates and combination methods
Spectral Subtraction noise and speech uncorrelated
slowly varying noise
-
8/2/2019 Robust Speech Recognition
7/17
Model Adaptation for noise Decomposition of HMM or PMC
Viterbi algorithm searches in a NxM state HMM Noise and speech simultaneously recognized
complex noises recognized
State dependant Wiener filtering Wiener filtering in spectral domain faces non-stationary
Hmms divide speech in quasi-stationary segments
wiener filters specific to the state
Discriminative training Classical technique trains models independently
error corrective training minimum classification error training
Training data contamination training set corrupted with noisy speech
depends on the test environment
lower discriminative scores
Training
-
8/2/2019 Robust Speech Recognition
8/17
Stochastic Matching : Introduction
General framework
in feature space
in model space
-
8/2/2019 Robust Speech Recognition
9/17
Stochastic Matching : Generalframework
HMM Models X, X training space
Y ={y1, , yt}observation in testing space and YFX XY G
XW
WYpW ,,maxarg)','(),(
M
j
jijix jiCxNwixp1
,, ,,,
S C
XCSYp ,,,maxarg'
X
W
WYp ,,max
XWYp ,,max
Y W
-
8/2/2019 Robust Speech Recognition
10/17
Stochastic Matching : In FeatureSpace
XX YCSYpEQ ,,,',,log'
Estimation step : Auxiliary function
Maximization step
''
maxarg' Q
CSXX CSYpCSYpQ
,
,',,log),,,('
-
8/2/2019 Robust Speech Recognition
11/17
Stochastic Matching : In FeatureSpace (2)
Simple distorsion function
Computation of the simple bias
iitit by
it
yFx
,,,
MNT
mnt imn
t
MNT
mnt imn
imnit
t
i
mn
ymn
b,,
,,2
,,
,,
,,
2
,,
,,,
,
,
'
b
1
2
y
-
8/2/2019 Robust Speech Recognition
12/17
Stochastic Matching : In ModelSpace
random additive bias sequence B={b1,,bt}independent of speech stochastic process of
mean b and diagonal covariance b
b
X,n,m
Y,n,m
222
bX,n,mY,n,m
-
8/2/2019 Robust Speech Recognition
13/17
On-Line Frame-Synchronous NoiseCompensation
Lies on stochastic matching method Transformation parameter estimated along with
optimal path.
Uses forward probabilities
b1 b2 b3 b4
Sequence of observations
Bias computation
y2y3
y4
z2 z3 z4 z5
reco reco reco
Transformed observations
-
8/2/2019 Robust Speech Recognition
14/17
Theoretical framework and issue
On line frame synchronous
cascade of errors
t N
n
M
m imn
B
t N
n
M
m imn
imni
B
itmnx
ymnx
b
1 1 12
,,
1 1 12
,,
,,,
,,,
,,
1
1
1. Initiate bias of first frame b0=0
2. Compute and then b
3. Transform next frame with b
4. Goto next frame
MNT
mnt imn
t
MNT
mnt imn
imnitt
i
mn
ymn
b,,
,,2
,,
,,
,,2
,,
,,,
,
,
'
Classical StochasticMatching
-
8/2/2019 Robust Speech Recognition
15/17
Viterbi Hypothesis vs LinearCombination
Viterbi Hypothesis take intoaccount only the most probable
state and gaussian component. Linear combination
t t+1
states
-
8/2/2019 Robust Speech Recognition
16/17
Experiments
Phone numbers in a running car
Forced Align
transcription + optimum path
Free Align
optimum path
Wild Align
no data
Viterbi LC Viterbi LC Viterbi LC
Word Accuracy 84,47 87,53 87,61 86,04 88,41 85,03 87,16 87,81 84,95
Forced Align Free Align Wild Alignbaseline MCR PMC
-
8/2/2019 Robust Speech Recognition
17/17
Perspectives
Error recovery problem
a forgetting process
a model of distorsion functionenvironmental clues
More elaborated transform