speech coding maryam zebarjad alessandro chiumento

SPEECH CODING

Maryam ZebarjadAlessandro Chiumento

SPEECH PROPERTIES

2 categories: Voiced and Unvoiced Voiced: quasi-periodic in the time domain and harmonically

structured in the frequency domain Unvoiced: random-like and broadband (like white noise)

Why speech coding? Efficient transmission Efficient storage

Problems:High quality with the lowest bit-rate

possible

Performance measures

2 ways of measuring:

Objective SNR, long term SEGSNR, short term

Subjective DRT

Diagnostic Rhyme Test DAM

Diagnostic Acceptability Measure

MOS Mean Opinion Score

4 standards for speech quality:

Broadcast, Network, Communications, Synthetic

Coding Techniques:

WAVEFORM CODERS digitalize speech on a sample-by-sample basis. The goal is to have

the output waveform closely match the input waveform. Scalar and vector quantization Sub-band coders Transform coders

SINUSOIDAL ANALYSIS-SYNTHESIS They relay on the sinusoidal representation of the speech waveform

Short - Time Fourier Transform models Sinusoidal Transform Coding Multiband Excitation Coder

VOCODERS Speech – specific coders

Formant Vocoders Channel Vocoders LPC Vocoders

Scalar and Vector Quantization

SQ: every sample is mapped into a specific codeExamples : PCM, DPCM, DM, ADPCM....

Scalar and Vector Quantization

VQ: the data (speech) is compressed by encoding it in blocks. The incoming vectors are formed from consecutive data samples or from model parameters.

Examples: VPCM, GS-VQ, A-VQ...

Sub-band Coders Unlike SQ and VQ this coders rely more on frequency- domain

properties of speech. the signal band is divided into frequency sub-bands using a bank

of bandpass filters. The output of each filter is then sampled (or down-sampled) and encoded.

Example: AT&T, CCITT (G.722),...

Transform Coders

Work on spectral properties of speech (like SBC)

They use unitary transforms whose parameters are quantized at the transmitter and decoded and inverse-transformed at the receiver

The potential for bit-rate reduction in transform coding lies in the fact that unitary transforms tend to generate near-uncorrelated transform components which can be coded independently

Although there are many possible transforms that can be used (DCT, DFT, WHT, KLT,…) all share the property of unitarity:

Example: Adaptive Transformation Coder It employs DCT and has high performance

Speech Coding Using Sinusoidal Analysis – Synthesis Models

This speech coders relay on the sinusoidal representation of the speech waveform

Speech Analysis-Synthesis Using the Short-Time Fourier Transform Speech is slowly time-varying (quasi-stationary) and can be

modeled by its short time spectrum

Analysis expression Synthesis expression

h(n) is the sliding analysis window and is often constrained to be about 5 – 20 ms

Speech Coding Using Sinusoidal Analysis – Synthesis Models

Speech Analysis-Synthesis Using the Sinusoidal Transform Coding The speech is represented by linear combination of sinusoids with

time-varying amplitudes, phases and frequencies:

McAulay - Quartieri

The number of sinusoids L is time-varying, the possibility to reduce bit-rate comes from the fact that voiced speech is highly periodic and L can be adjusted accordingly.

Furthermore the statistical properties of the Short-Time spectrum of unvoiced speech are preserved.

Vocoders

Speech specific Low bit rate but performance degrades for non

speech signals 4 types:

Channel, Formant, Homomorphic, LPC LPC Vocoders are divided in 3 categories based in

excitation models: 2-state excitation Mixed excitation residual

LPC VocoderFor a p-th order forward linear prediction the present sample if predicted from linear compination of p past samples

The prediction parameters are obtained by minimizing the mean square forward prediction error

where

For forward estimation:

The system can be solved using the recursion:

Levinson – Durbin

Wokplan

Implementation of: LPC Vocoder DCT Transform Coder DPCM Coder

Comparison of three methods for specific speech signals

speech coding maryam zebarjad alessandro chiumento

Documents

speech quality

data speech

spectral properties

time domain

white noisewhy speech

timevarying amplitudes

sinusoidal representation

sliding analysis window