Speech Signal Speech Signal Processing IProcessing I
By By
Edmilson Morais And Prof. Greg. DogilEdmilson Morais And Prof. Greg. Dogil
Second LectureSecond Lecture
Stuttgart, October 25, 2001Stuttgart, October 25, 2001
The Speech SignalThe Speech Signal
No-stacionary signal No-stacionary signal Voiced – Voiced – almostalmost periodic (Concept of periodic (Concept of
pitchpitch)) Unvoiced (aleatory)Unvoiced (aleatory) Transitions (Bursts, ...)Transitions (Bursts, ...)
Range of the Range of the PitchPitch Male : Male : Female : Female :
Sampling TheorySampling Theory
Low-pass filter
Sample Hold onLow-pass filter
X(n) has to be limited in band
The sampling frequency has to be higher or equal to 2 times the maximum frequency in x(n)
Matlab :Matlab : Graphical visualization – Graphical visualization – Optimization in a hiperbolic (quadratic) Optimization in a hiperbolic (quadratic) surfacesurface
Mean squarederror - E
Weight
minE
w
E
w
Ew
0 0w )(nw
)1( nw -200 -150 -100 -50 0 50 100 150 200-200
-150
-100
-50
0
50
100
150
200
-200 -150 -100 -50 0 50 100 150 200-200
-150
-100
-50
0
50
100
150
200
-4-2
02
4
-4
-2
0
2
40
20
40
60
80
H(1)H(2)
Err
o q
uadrá
tico
SDSP : SDSP : Looking through timeLooking through time
time
amplitude
Speech signal : Analog and digitalSpeech signal : Analog and digital
Sampling rate
quantization
SDSP SDSP : Transformation and Digital filters: Transformation and Digital filters
Transformations Z-Transforms, Fourier transforms
Digital filters FIR, IIR
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-500
-400
-300
-200
-100
0
100
Normalized Frequency ( rad/sample)
Phase (degrees)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80
-60
-40
-20
0
Normalized Frequency ( rad/sample)
Magnitude (dB
)
SDSP –SDSP – Frame based analysis Frame based analysis
Hanning window : w
Waveform multiplied for the hanning window : xw
Magnitude of the spectrum of xw
Freq. Response of the LP-filter
SDSP - SDSP - Looking at frequency components Looking at frequency components through timethrough time
Current
Previous
Current
Previous
Before smoothing
After smoothing
SDSP : SDSP : Vector quantizationVector quantization
Voronoi Space : Centroid and Distortion meassure
TTS - TTS - Waveform generation for TTSWaveform generation for TTS
Analysis and Resynthesis – Coding and DecodingAnalysis and Resynthesis – Coding and Decoding
LP AnalysisA(z)
Inverse Filter1
A(z )Pitch Marks
PrototypesSampling
Synthesis Filter
A(z )
TFI ResidueSynthesis
x
e
En
Storage Enviroment
x
A
A
A Fo
Original Speech Signal
Synthesized Speech Signal
Coding
Decoding
ProsodicInformation
Marks
Marks
Fo
En
U/UV
U/UV
.
.
Parametrization : Parametrization : Mapping the Mapping the waveform into a set waveform into a set of parametersof parameters
Reconstruction:Reconstruction: Synthesis of the Synthesis of the waveform from the set waveform from the set of parameters.of parameters.
Prosody :Prosody :
F0F0
DurationDuration
AmplitudeAmplitude
AA – LP coeficients – LP coeficients
ee – LP residue – LP residue
EnEn – Prototypes – Prototypes
FoFo – Fundamental – Fundamental frequencyfrequency
U/UVU/UV – Voiced / – Voiced / Unvoiced Unvoiced transitionstransitions
TTS - TTS - Waveform generation for TTSWaveform generation for TTS
Speech codingSpeech coding Parametric coders, Waveform coders, Hybrid codersParametric coders, Waveform coders, Hybrid coders
TTS – Concatenative approachTTS – Concatenative approach Time scale and Frequency scale modificationsTime scale and Frequency scale modifications Spectral smoothingsSpectral smoothings Unit selectionUnit selection
Original Resynthesized Modified : sinsin((x+x+))
Original TTS
ASR - ASR - Automatic Speech RecognitionAutomatic Speech Recognition
Front-End Signal ProcessingFront-End Signal Processing Feature extractionFeature extraction
Perceptual domain, Articulatory domainPerceptual domain, Articulatory domain Acoustic modelingAcoustic modeling
HMM : Hidden Markov ModelHMM : Hidden Markov Model ANN/HMM : Hybrid models - Artificial ANN/HMM : Hybrid models - Artificial
Neural Network and HMMNeural Network and HMM Statistical Language ModelingStatistical Language Modeling
N-grammars, smoothing techniquesN-grammars, smoothing techniques Search : DecodingSearch : Decoding
Viterbi, Stack decoding, ...Viterbi, Stack decoding, ...
ASR – HMM - ASR – HMM - TopologyTopology
Ergotic modelErgotic model Left-right model Left-right model
ASR – HMM – ASR – HMM – Basic Basic principleprinciple
X X X X X X X X X X X X X X X X X1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
u ã ã p
u u u i i ã ã ã p
u n i k k k k k ã ã ã p p
u i kn n n n n
u u u n i i i i ã ã ã ã pk k k k
p(x | )
p(x | )p
p(x | )k
p(x | )n
p(x | )ã
p(x | )i
u
x
a a a a a
a a a a
aa a
a
ASR – HMM - ASR – HMM - Viterbi alignmentViterbi alignment
50 100 150 200 50 100 150 200
(a) (b)
50 100 150 200 50 100 150 200
(c) (d)
Evaluation : Evaluation : Exercises and Exercises and SimulationsSimulations
List of ExercisesList of Exercises SDSP, TTS, ASRSDSP, TTS, ASR
SimulationsSimulations SDSPSDSP
Vector quantizationVector quantization TTSTTS
Waveform InterpolationWaveform Interpolation ASRASR
Acoustic modeling using : HMM and ANN+HMMAcoustic modeling using : HMM and ANN+HMM Language modelingLanguage modeling DecodingDecoding
Evaluation : Evaluation : ReportReport
ReportsReports Write the analysis and results of the simulation in a format Write the analysis and results of the simulation in a format
of a paperof a paper 4 pages, two colunms. Sections
Abstract Introduction Brief theoretical description of the method Methodology used to perform the experiment Results Conclusions and suggestions for further works Bibliograph
Days of classes
Normal semester
2001October : 18, 25, (01 is a hollyday)November : 8, 15, 22, 29December : 6,13,20
2002January : 10,17,24,31 February : 7,14
Total : 15 days.
Option two
2001October : 18, 25November : 8, 15, 22, 29
2002February : 7,14March : An one week block seminar : 1.5 hours a day.
Total : 13 days.
Option one
2001October : 16,18,23,25,30November : 6,8,13,15,20,22,27,29
2002February : 5,7,12,14
Total : 17 days.
Option three
2002March : An one week block seminar : 3 hours a day.
Equivalent to 15 days