speech coding maryam zebarjad alessandro chiumento
TRANSCRIPT
SPEECH PROPERTIES
2 categories: Voiced and Unvoiced Voiced: quasi-periodic in the time domain and harmonically
structured in the frequency domain Unvoiced: random-like and broadband (like white noise)
Why speech coding? Efficient transmission Efficient storage
Problems:High quality with the lowest bit-rate
possible
Performance measures
2 ways of measuring:
Objective SNR, long term SEGSNR, short term
Subjective DRT
Diagnostic Rhyme Test DAM
Diagnostic Acceptability Measure
MOS Mean Opinion Score
4 standards for speech quality:
Broadcast, Network, Communications, Synthetic
Coding Techniques:
WAVEFORM CODERS digitalize speech on a sample-by-sample basis. The goal is to have
the output waveform closely match the input waveform. Scalar and vector quantization Sub-band coders Transform coders
SINUSOIDAL ANALYSIS-SYNTHESIS They relay on the sinusoidal representation of the speech waveform
Short - Time Fourier Transform models Sinusoidal Transform Coding Multiband Excitation Coder
VOCODERS Speech – specific coders
Formant Vocoders Channel Vocoders LPC Vocoders
Scalar and Vector Quantization
SQ: every sample is mapped into a specific codeExamples : PCM, DPCM, DM, ADPCM....
Scalar and Vector Quantization
VQ: the data (speech) is compressed by encoding it in blocks. The incoming vectors are formed from consecutive data samples or from model parameters.
Examples: VPCM, GS-VQ, A-VQ...
Sub-band Coders Unlike SQ and VQ this coders rely more on frequency- domain
properties of speech. the signal band is divided into frequency sub-bands using a bank
of bandpass filters. The output of each filter is then sampled (or down-sampled) and encoded.
Example: AT&T, CCITT (G.722),...
Transform Coders
Work on spectral properties of speech (like SBC)
They use unitary transforms whose parameters are quantized at the transmitter and decoded and inverse-transformed at the receiver
The potential for bit-rate reduction in transform coding lies in the fact that unitary transforms tend to generate near-uncorrelated transform components which can be coded independently
Although there are many possible transforms that can be used (DCT, DFT, WHT, KLT,…) all share the property of unitarity:
Speech Coding Using Sinusoidal Analysis – Synthesis Models
This speech coders relay on the sinusoidal representation of the speech waveform
Speech Analysis-Synthesis Using the Short-Time Fourier Transform Speech is slowly time-varying (quasi-stationary) and can be
modeled by its short time spectrum
Analysis expression Synthesis expression
h(n) is the sliding analysis window and is often constrained to be about 5 – 20 ms
Speech Coding Using Sinusoidal Analysis – Synthesis Models
Speech Analysis-Synthesis Using the Sinusoidal Transform Coding The speech is represented by linear combination of sinusoids with
time-varying amplitudes, phases and frequencies:
McAulay - Quartieri
The number of sinusoids L is time-varying, the possibility to reduce bit-rate comes from the fact that voiced speech is highly periodic and L can be adjusted accordingly.
Furthermore the statistical properties of the Short-Time spectrum of unvoiced speech are preserved.
Vocoders
Speech specific Low bit rate but performance degrades for non
speech signals 4 types:
Channel, Formant, Homomorphic, LPC LPC Vocoders are divided in 3 categories based in
excitation models: 2-state excitation Mixed excitation residual
LPC VocoderFor a p-th order forward linear prediction the present sample if predicted from linear compination of p past samples
The prediction parameters are obtained by minimizing the mean square forward prediction error
where
For forward estimation: