topic: spectrogram, cepstrum and mel-frequency analysis · representation of a speech ... play a...
TRANSCRIPT
![Page 1: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/1.jpg)
Speech Technology - Kishore Prahallad ([email protected])1
Speech Technology: A Practical Introduction
Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis
Kishore PrahalladEmail: [email protected]
Carnegie Mellon University&
International Institute of Information Technology Hyderabad
![Page 2: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/2.jpg)
Speech Technology - Kishore Prahallad ([email protected])2
Topics
• Spectrogram• Cepstrum • Mel-Frequency Analysis • Mel-Frequency Cepstral Coefficients
![Page 4: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/4.jpg)
Speech Technology - Kishore Prahallad ([email protected])4
Speech signal represented as a sequence of spectral vectors
FFT FFT FFT
Spectrum
![Page 5: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/5.jpg)
Speech Technology - Kishore Prahallad ([email protected])5
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
![Page 6: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/6.jpg)
Speech Technology - Kishore Prahallad ([email protected])6
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Hz
Amp.
![Page 7: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/7.jpg)
Speech Technology - Kishore Prahallad ([email protected])7
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Hz
Amplitude
Rotate it by 90 degrees
![Page 8: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/8.jpg)
Speech Technology - Kishore Prahallad ([email protected])8
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Hz • MAP spectral amplitude to a grey level (0-255) value. 0 represents black and 255 represents white.• Higher the amplitude, darker the corresponding region.
Amplitude
![Page 9: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/9.jpg)
Speech Technology - Kishore Prahallad ([email protected])9
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Hz
Time
![Page 10: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/10.jpg)
Speech Technology - Kishore Prahallad ([email protected])10
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Hz
Time
Time Vs Frequency representation of a speech
signal is referred to as spectrogram
![Page 11: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/11.jpg)
Speech Technology - Kishore Prahallad ([email protected])11
Some Real Spectrograms
Dark regions indicate peaks (formants) in the spectrum
![Page 12: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/12.jpg)
Speech Technology - Kishore Prahallad ([email protected])12
Why we are bothered about spectrograms
Phones and their properties are
better observed in spectrogram
![Page 13: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/13.jpg)
Speech Technology - Kishore Prahallad ([email protected])13
Why we are bothered about spectrograms
Sounds can be identified much
better by the Formants and by their transitions
![Page 14: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/14.jpg)
Speech Technology - Kishore Prahallad ([email protected])14
Why we are bothered about spectrograms
Sounds can be identified much
better by the Formants and by their transitions
Hidden Markov Models implicitly model these spectrograms to perform speech recognition
![Page 15: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/15.jpg)
Speech Technology - Kishore Prahallad ([email protected])15
Usefulness of Spectrogram• Time-Frequency representation of the speech signal
• Spectrogram is a tool to study speech sounds (phones)
• Phones and their properties are visually studied by phoneticians
• Hidden Markov Models implicitly model spectrograms for speech totext systems
• Useful for evaluation of text to speech systems– A high quality text to speech system should produce synthesized
speech whose spectrograms should nearly match with the natural sentences.
![Page 17: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/17.jpg)
Speech Technology - Kishore Prahallad ([email protected])17
A Sample Speech Spectrum
Frequency (Hz)
dB
• Peaks denote dominant frequency components in the speech signal
• Peaks are referred to as formants• Formants carry the identity of the sound
![Page 18: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/18.jpg)
Speech Technology - Kishore Prahallad ([email protected])18
What we want to Extract? –Spectral Envelope
• Formants and a smooth curve connecting them• This Smooth curve is referred to as spectral envelope
Frequency (Hz)
dB
![Page 19: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/19.jpg)
Speech Technology - Kishore Prahallad ([email protected])19
Spectral Envelope
Spectral Envelope
Spectrum
Spectral details
![Page 20: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/20.jpg)
Speech Technology - Kishore Prahallad ([email protected])20
Spectral Envelope
Spectral Envelope
Spectrum
Spectral details
log X[k]
log H[k]
log E[k]
![Page 21: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/21.jpg)
Speech Technology - Kishore Prahallad ([email protected])21
Spectral Envelope
Spectral Envelope
Spectrum
Spectral details
log X[k]
log H[k]
log E[k]
log X[k] = log H[k] + log E[k]
1. Our goal: We want to separate spectral envelope and spectral details from the spectrum.
2. i.e Given log X[k], obtain log H[k] and log E[k], such that log X[k] = log H[k] + log E[k]
![Page 23: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/23.jpg)
Speech Technology - Kishore Prahallad ([email protected])23
Play a Mathematical Trick
Spectral Envelope
Spectral details
Spectrum
• Trick: Take FFT of the spectrum!!
• An FFT on spectrum referred to as Inverse FFT (IFFT).
• Note: We are dealing with spectrum in log domain (part of the trick)
• IFFT of log spectrum would represent the signal in pseudo-frequency axis
![Page 24: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/24.jpg)
Speech Technology - Kishore Prahallad ([email protected])24
Play a Mathematical Trick
Spectral Envelope
A pseudo-frequency axis
Spectral details
Spectrum
![Page 25: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/25.jpg)
Speech Technology - Kishore Prahallad ([email protected])25
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
![Page 26: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/26.jpg)
Speech Technology - Kishore Prahallad ([email protected])26
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
![Page 27: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/27.jpg)
Speech Technology - Kishore Prahallad ([email protected])27
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
Treat this as a sine wave
with 4 cycles per sec.
![Page 28: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/28.jpg)
Speech Technology - Kishore Prahallad ([email protected])28
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
Treat this as a sine wave
with 4 cycles per sec.
Gives a peak at 4 Hz in frequency
axis
![Page 29: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/29.jpg)
Speech Technology - Kishore Prahallad ([email protected])29
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
Treat this as a sine wave
with 4 cycles per sec.
Gives a peak at 4 Hz in frequency
axis
![Page 30: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/30.jpg)
Speech Technology - Kishore Prahallad ([email protected])30
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
![Page 31: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/31.jpg)
Speech Technology - Kishore Prahallad ([email protected])31
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
Treat this as a sine wave with 100 cycles per
sec.
Gives a peak at 100 Hz in frequency
axis
![Page 32: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/32.jpg)
Speech Technology - Kishore Prahallad ([email protected])32
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
Low Freq. region
High Freq. region
IFFT
IFFT
![Page 33: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/33.jpg)
Speech Technology - Kishore Prahallad ([email protected])33
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
![Page 34: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/34.jpg)
Speech Technology - Kishore Prahallad ([email protected])34
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
IFFT
log X[k] = log H[k] + log E[k]
log H[k]
log E[k]
![Page 35: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/35.jpg)
Speech Technology - Kishore Prahallad ([email protected])35
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
IFFT
log X[k] = log H[k] + log E[k]
log H[k]
log E[k]
x[k] = h[k] + e[k]
![Page 36: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/36.jpg)
Speech Technology - Kishore Prahallad ([email protected])36
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
IFFT
log X[k] = log H[k] + log E[k]
log H[k]
log E[k]
x[k] = h[k] + e[k]
In practice all you have access to only log X[k] and hence you can obtain x[k]
![Page 37: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/37.jpg)
Speech Technology - Kishore Prahallad ([email protected])37
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral detailsA pseudo-frequency
axis
IFFT
log X[k] = log H[k] + log E[k]
log H[k]
log E[k]
x[k] = h[k] + e[k]
If you know x[k] Filter the low
frequency region to get h[k]
![Page 38: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/38.jpg)
Speech Technology - Kishore Prahallad ([email protected])38
Play a Mathematical Trick
Spectral Envelope
Spectrum
Spectral details
A pseudo-frequency axis
IFFT
log X[k] = log H[k] + log E[k]
log H[k]
log E[k]
x[k] = h[k] + e[k]
• x[k] is referred to as Cepstrum • h[k] is obtained by considering
the low frequency region of x[k].• h[k] represents the spectral
envelope and is widely used as feature for speech recognition
![Page 39: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/39.jpg)
Speech Technology - Kishore Prahallad ([email protected])39
Cepstral Analysis
][][][
sidesboth on FFTinverseTaking
||][||log||][||log||][||log
sidesboth on Log Take
magnitude denotes||.||
||][||||][||||][||
][][][
kekhkx
kEkHkX
kEkHkX
kEkHkX
+=
+=
−=
=
![Page 41: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/41.jpg)
Speech Technology - Kishore Prahallad ([email protected])41
Review: What we did
• We captured spectral envelope (curve connecting all formants)
• BUT: Perceptual experiments say human ear concentrates on certain regions rather than using whole of the spectral envelope….
Frequency (Hz)
dB
![Page 42: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/42.jpg)
Speech Technology - Kishore Prahallad ([email protected])42
Mel-Frequency Analysis
• Mel-Frequency analysis of speech is based on human perception experiments
• It is observed that human ear acts as filter – It concentrates on only certain frequency
components
• These filters are non-uniformly spaced on the frequency axis– More filters in the low frequency regions – Less no. of filters in high frequency regions
![Page 44: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/44.jpg)
Speech Technology - Kishore Prahallad ([email protected])44
Mel-Frequency FiltersMore no. of filters in low freq. region
Lesser no. of filters in high freq. region
![Page 45: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/45.jpg)
Speech Technology - Kishore Prahallad ([email protected])45
Mel-Frequency Cepstral Coefficients (MFCC)
• Spectrum � Mel-Filters � Mel-Spectrum
• Say log X[k] = log (Mel-Spectrum) • NOW perform Cepstral analysis on log X[k]
– log X[k] = log H[k] + log E[k]– Taking IFFT – x[k] = h[k] + e[k]
• Cepstral coefficients h[k] obtained for Mel-spectrum are referred to as Mel-Frequency Cepstral Coefficients often denoted by *MFCC*
![Page 46: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/46.jpg)
Speech Technology - Kishore Prahallad ([email protected])46
Speech signal represented as a sequence of spectral vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Mel-Filters
Cepstral Analy.
![Page 47: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/47.jpg)
Speech Technology - Kishore Prahallad ([email protected])47
Speech signal represented as a sequence of CEPSTRAL vectors
FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Cepstral Vectors
![Page 48: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/48.jpg)
Speech Technology - Kishore Prahallad ([email protected])48
Why we are going to use MFCC
• Speech synthesis– Used for joining two speech segments S1 and S2– Represent S1 as a sequence of MFCC– Represent S2 as a sequence of MFCC– Join at the point where MFCCs of S1 and S2 have
minimal Euclidean distance
• Used in speech recognition – MFCC are mostly used features in state-of-art speech
recognition system
![Page 49: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/49.jpg)
Speech Technology - Kishore Prahallad ([email protected])49
Summary: Process of Feature Extraction
• Speech is analyzed over short analysis window• For each short analysis window a spectrum is obtained
using FFT • Spectrum is passed through Mel-Filters to obtain Mel-
Spectrum• Cepstral analysis is performed on Mel-Spectrum to
obtain Mel-Frequency Cepstral Coefficients• Thus speech is represented as a sequence of Cepstral
vectors• It is these Cepstral vectors which are given to pattern
classifiers for speech recognition purpose
![Page 50: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis · representation of a speech ... Play a Mathematical Trick Spectral Envelope Spectral details ... • x[k] is referred to](https://reader034.vdocuments.net/reader034/viewer/2022051010/5ada36bb7f8b9a52528c8775/html5/thumbnails/50.jpg)
Speech Technology - Kishore Prahallad ([email protected])50
Additional Reading
• Chapter 6– Pg: 273 – 281
– Pg: 304 – 311– Pg: 314 - 316