communications & multimedia signal processing refinement in ftlp-hnm system for speech...
Post on 21-Dec-2015
223 views
TRANSCRIPT
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Refinement in FTLP-HNM system for Speech Enhancement
Qin Yan
Communication & Multimedia Signal Processing Group
School of Engineering and Design, Brunel University
23 Nov, 2005
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Outline
• Review of FTLP-HNM system;
• Parameters estimation of HNM (incl. pitch/harmonic tracking in noise)
• Objective results of pitch, harmonic tracking and FTLP-HNM system
• Demo of enhanced speeches from old archive recordings
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Overview of FTLP-HNM Speech Enhancement System
LP ModelDecomposition
Pre-cleaning
HNM of Residual
KalmanFilters
Noisy Speech
Formant Estimation
Kalman Filters
Synthesized LPModel
LP ModelRe-composition
Enhanced Speech
Pitch Estimation
Voiced/Unvoiced
Classification
Formant estimation
HNM estimation
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
)()()(ˆ tStStS nh
)(
)(
)(0)()(tL
tLk
ttjkwkh etAtS
)](),()[()( tbthtetSn
In HNM, speech is decomposed to two parts : Harmonic part and noise part.
where L(t) denotes the number of harmonic included in the harmonic part, ω0 denotes the pitch frequency.
Harmonic :
Noise :
Synthesized Speech :
where h the a time-varying autoregressive(AR) model and b is white Gaussian noise.
Harmonic plus Noise Model
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
HNM - Pitch Tracking
MaxF
k
MkF
MkFl
lXFEFE1
00
0
0
)(log)(
MaxF
k
MkF
MkFl
lXlWFEFE1
00
0
0
)(log)()(
• In noisy condition the error function is modified to including SNR dependent weights
The weighting function W(l) is a SNR-dependent given by)(1
)()(
lSNR
lSNRlW
• Error function in frequency domain
NOTE:• The input speech frame is bandpassed to eliminated the parts which don’t contain explicit harmonics.
• For Each speech frame, it outputs several pitch candidates (N=3) and Viterbi algorithm then generates the final pitch tracks.
•It might be useful to have candidates from this method and traditional autocorrelation method.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
0
0.05
0.1
0.15
0.2
0 5 10 15 20SNR(dB)
Erro
r %
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 5 10 15 20SNR (dB)
Err
or %
Improved methed with weightsImproved method without weightsGriffin's method
Figure - Comparison of the performance of different pitch track methods for speech in (a) train noise (b) car noise from 0dB SNR to clean.
Results of Pitch Tracking
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing HNM - Harmonic Tracking
Peak picking
Pitch Tracking
Noise Speech
VADNoise model
FFTHarmonic
Frequency bin tracks
Harmonic Track
Candidates
Smoothed Harmonic
Magnitude by Kalman filter
Tracking
• Data structure of harmonic track candidates are improved and speed up the whole system.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Results of Harmonic Tracking in Clean Speech
Figure - An illustration of pitch tracks of a speech segment at sampling frequency of 8kHz.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Results of Harmonic Tracking in Noisy Speech
Pitch recovery
Harmonic Recovery
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Synthesis of Excitation by HNM
Voiced Excitation :
Unvoiced Excitation :
)*exp(*))((*)()()()()(ˆ)(
)(
)(0
jmestdmbetAmemememL
mLk
mmjkknh
)*exp(*))((*)()()(ˆ jmestdmbmeme n
Where b(m) is unit white Gaussian noise , e(m) is original excitation and a is the phases of original excitation.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Results of Speech Enhancement
Figure - Comparison of the harmonicity of MMSE and FTLP-HNM systems on train noisy speech at different SNRs
15
18
21
24
0 5 10 15 20SNR(dB)
Har
mon
icity
noisy MMSE FTLP-HNM
1.4
1.7
2
2.3
2.6
2.9
0 5 10 15 20SNR(dB)
PE
SQ
noisyMMSEFTLP-HNM
110
1 , 1
110log
2frames
NHk k
N kframes k k
P PHarmonicity
NH N P
Figure - Performance of MMSE and FTLP-HNM on train noisy speech at different SNR levels.
Enhanced speech is synthesized by inverse filtering the HNM residual with cleaned LP shape.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Original speech Enhanced speech
Demo (1)
Persian speech for Iranian King Mozaffareddin Shah
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Demo (2)
Florence Nightinggale 1890
Original speech Enhanced speech
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing