04 source coding

54
8/18/2019 04 Source Coding http://slidepdf.com/reader/full/04-source-coding 1/54 Principles of Source coding (applied to GSM & UMTS) r. c am rou a Damascus, 18 th March 2010

Upload: badara6298

Post on 06-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 1/54

Principles of Source coding(applied to GSM & UMTS)

r. c am rou a

Damascus, 18 th March 2010

Page 2: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 2/54

Page 3: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 3/54

Keypad

Display

ControlMicroprocessor

&

7654321

4454567

“ Mixer” RF Power

sendend

Memory Tx Control Power

A/D g a

Processing

Digital

Modulator (up-

convert) Ampl if ier

(PA)

an

Filter

To T/R Switch

swith

AnalogDigital

AnalogSource Coding

Base band analog waveform

Base band digit al waveform

Channel Coding

Interleaving

RF wave form

Burst formatting Ciphering

Page 4: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 4/54

Page 5: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 5/54

– Reduce the number of bits in order to save on transmission time orstorage space

Page 6: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 6/54

Wave Form codecs:

– Just sampling and coding without thought of speech generation – g -qua ty an not comp ex – Large amount of bandwidth – PCM Pulse Code Modulation G.711 ISDN: 8b x 8k/s = 64 kbit/s

CD: 16b x 44 k/s x 2ch = 1.408 Mbit/s – ADPCM (Adaptive Differential PCM) G.726/27, 40 / 32 / 24 / 16 kbit/s –

Page 7: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 7/54

Source codecs (vocoders, synthetic voice):nco ng : a c e ncom ng s gna o a ma mo e o speec

generationLinear-predictive filter model of the vocal tract

Parameters: voiced/unvoiced flag for the excitationParameters of the filter is sent! (Not the sampled signal)

. .speech synthesis)

Low bit rate, but sounds syntheticHigher bit rate does not improve much

Codecs: Linear Predictive Coding (LPC)

Page 8: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 8/54

Hybrid codecs: – Attempt to provide the best of both – Perform a degree of waveform matching – Utilize the sound production model – – Time Domain Analysis-by-Synthesis(AbS) codecs: – The most commonly used Not a simple two-state, voiced/unvoiced – Different excitation signals are attempted – Closest to the original waveform is selected – CELP Code book Excited Linear Prediction – ACELP (Algebraic CELP) – RPE-LTP (Regular Pulse Excitation - Long-Term Prediction) – ec or- um xc e near re c on

Page 9: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 9/54

“Nowadays, a chicken leg is a rare dish”

SourceChannel Simulation

Impairment

Codec ‘X’1 2 3 4 5

5 Excellent Imperceptible

4 Good Just perceptible but not annoying1 2 3 4 53 Fair Perceptible and slightly annoying

2 Poor Annoying but not objectionable

1 Unsatisfactory Very annoying and objectionable

Page 10: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 10/54

- ac or: n-serv ce vo ce qua y measuremen

based on observed traffic flow for every phone call

Mean Opinion Square: To arrive at an MOSscore, a tester assembles a panel of “expertlisteners” who rate the quality of speech samplesthat have been processed by the system under

.

– Ideally, a panel would consist of a mix ofmale and female listeners of various ages

– The samples should reflect a range oftypical voice conversations

– The panel rates the quality of the system

output from 1 to 5, with 1 indicating the

– The scores of the panelists are thenaveraged

Page 11: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 11/54

5

( M

O S )

Hybrid Coders

3 e

Q u a

l i tWaveform Coders

2

S u

b j e c t i

Vocoders(Older Technology)

2 4 8 16 32 64

KbpsScore Quality Description of Impairment543

ExcellentGoodFair

ImperceptibleJust Perceptible, not AnnoyingPerceptible and Slightly Annoying

2

1

Poor

Bad

Annoying but not Objectionable

Very Annoying and Objectionable

Page 12: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 12/54

Open vocal c ords

Closed vocal cords

Speech is produced by a cooperation of lungs, glottis (with vocal cords) and thevoca rac gu ura cav y, ora cav y, nasa cav y .

The vocal tract is excited with pressure pulses of airflow (product of the vocalcords) with period of opening and closing phase about 10 ms.

The larynx and the vocal cords produce periodic or noisy signal for voiced orunvoiced phonemes.

Page 13: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 13/54

resonance frequencies, called formant frequencies.

Parameters of the cavity (its transverse section in single cuts)are varied during the speech.

Since the mouth cavity can be greatly changed, we are able topronounce many different sounds.

For voiced sounds pitch impulses (generated by the vocal cords)s mu a e e a r n e mou an or cer a n soun s nasa a sostimulate the nasal cavity.In the case of unvoiced sounds, the excitation of the vocal tract ismore noise-like.

Page 14: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 14/54

high [i]

heed

lower high[ɪ ] hid

low mid

had

hod

Page 15: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 15/54

The blue--- s---p--o---------t i-s--on--the-- k--ey a---g--ai----n------

“ ” “ ”

14

“o” in “spot”“ee” in “key”“e” in “again”“s” in “spot”“k” in “key”

Page 16: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 16/54

Some Properties of Speech

Vowels Created due vocal cords vibrations atfrequencies between 50 Hz & 1 kHz.

“oo” in “blue”

-pressure wave which excites the vocal tract.

Frequency of the pressure signal is the pitch

“o” in “spot”

frequency or fundamental frequency (F0).Voiced waveforms are quasi-periodic.

“ee” in “key”

15

“e” in “again”

Page 17: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 17/54

• If is a speech frame, then autocorrelation is

+⋅= k m xm xk A(

][n xw )(k A

m

SpeechFrame

Short Time Autocorrelation

of fundamental period

Page 18: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 18/54

Some properties of speech

Consonants Generated by forming a constriction at

some point in the vocal tract, such asthe teeth or lips, and forcing air through

“s” in “spot”the constriction to produce turbulence.

This is regarded as a broad-spectrum.

“k” in “key”

Page 19: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 19/54

signal itself: how many pulses per second does the signal contain.

– In the case of speech signal, each pulse is produced by a signal vibration ofthe vocal folds. – The frequency of these pulses is measured in Hertz (Hz).

Pitch is a perceptual term . – ’

low in pitch, the same pitch as the previous portion of the signal, ordifferent? – Pitch can be a property of speech or non-speech signals. I.e., music, high-

, - .

Tone is a linguistic term . – Tone refers to a phonological category that distinguishes two words oru erances. – Tone is a term relevant for language, and only for languages in which pitch

plays a linguistic role (convey meaning).

Page 20: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 20/54

Typical impulse sequence of a voiced

Typical impulse sequence (sound pressure function) produced by the vocal cords for a.

It is the part of the voice signal that defines the speech melody.

Page 21: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 21/54

Formants

Resonancefrequencies of thevocal tract.

the sound of vocalcords.

Fine (pitch) harmonicstructure:

vibrating vocal cords(narrow peaks).

Page 22: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 22/54

Fine (pitch) harmonic structure: 1st formant 150-850 Hz – reflects the quasi-periodicity of speech – attributed to the vibrating vocal cords (narrow peaks).

Formant structure (envelope peaks):

2nd formant 500-2500 Hz

3rd formant 1500-3500 Hz

– (resonances of the vocal tract).

– Short term correlation in the time domain.

orman - z

Page 23: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 23/54

Page 24: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 24/54

The vocal tract can be modeled as a hard-walled lossless tuberesonator consisting of N tubes with different cross-sectional areas

NoseCavity

Phar nxCavity ou

Cavity

Glottis Vocal tract Lips

Page 25: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 25/54

F n =

(2n - 1) c

4L

F n = n th formant freq. [Hz]

L = length of the tube [m]

1/4 Wavelength

3/4 Wavelength

5/4 Wavelength

First 3 resonances oftube with 1 closed end

Page 26: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 26/54

Source-filter theory of

Excitation Vocal tract Lip

Voice roduction mechanism can be modeled as a series connection of an excitation source and a filter system.

Source and filter are considered independent of each other.

Glottal pulses correspond to source and vocal tract corresponds to the filter In the case of voiced speech sounds, the excitation is modeled by an impulse train corresponding to theglottal pulses.

In the case of unvoiced sounds, the excitation is modeled by a noise-like signal.

.

The theory provides a theoretical background for the inverse filtering technique.

Page 27: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 27/54

Source-Filter model of speech

Excitat ion si nalPitch

Speech

f Source

Page 28: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 28/54

Source-filter theory of

erH(z)

nverse er [1 / H(z) ]

Unknown input(excitation)

nown ou pu(speech)

Predicted ou tput(predicted speech)

s ma e npu(excitation, residual)

The source-filter theory of speech production provides theoretical background for the inverse filteringtechnique.

If the transfer function of the vocal tract filter is known, an inverse filter can be constructed.

Inverse filtering basically involves extracting two signals, the volume velocity waveform at the glottis,and the effect of the vocal tract filter, from a single source signal. It simulates the inverse characteristicsof the vocal tract

In principle, the glottal excitation signal can then be reconstructed by feeding the speech signal throughe nverse o e voca rac er.

The result of inverse filtering has to be regarded as an estimate of the glottal flow. The actual volumevelocity waveform at the glottis is not known exactly.

,LPC analysis.

Page 29: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 29/54

Filterin

InverseFiltering

Page 30: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 30/54

sampleong- erm pre c on

time

short-term prediction

Because of resonance properties of the vocal tract there is high correlation betweensuccessive samples of the speech signal (redundancy).

Predictive Coding: Remove redundancy between successive pixels and only encoderesidual between actual and predicted.

Residue usually has much smaller dynamic range, allowing fewer quantization levels forthe same mean square error (= compression).

short-term (resonance of vocal tract)

long-term (periodicity of voiced speech (vocal cord vibration))

Page 31: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 31/54

Page 32: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 32/54

- -

• Synthesize the input speech

• Try all sets of parameters

error is used

• This is called closed-loo quantization

Page 33: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 33/54

Page 34: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 34/54

Page 35: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 35/54

Page 36: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 36/54

LPC vocoders extract salient features of speech (formants) directly from the waveform, ratherthan transforming the signal to the frequency domain.

LPC Features:

– uses a time-varying model of vocal tract sound generated from a given excitation – transmits only a set of parameters modeling shape and excitation of the vocal tract, not actual signals.

Page 37: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 37/54

-

Pre-processing STP LTP RPE

HammingWindow Inverse

Filter + LPF

r Selection

d

~

e

Segmentation

LongTerm M

U X20ms

r(i)

Pre-emphasis

Short

Term

Prediction

Speech input r i Analysis

STP: Short term prediction

LTP: Long term prediction RPE: Regular pulse excitation Input frame residuesignal

intermediateresult

General Steps Before Feature

Page 38: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 38/54

General Steps Before Feature

re-emp as s er ng : – The natural attenuation that arises

from voice source is about -12. -higher frequencies of voiced soundsmore apparent.

Usually: H (z ) = 1 - α z -1 , with α ≈ 1

Windowing : – Discrete Fourier transform (DFT)

assumes that the signal is periodic.

spectral artefacts (spectralleakage/smearing) that arise fromdiscontinuities at the frameendpoints.

Typically Hamming window is used.

Page 39: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 39/54

Short Time Correlation refers to the

0

20PSD plot with formant frequencies (Black broken lines)

Pole Fre uencies of LPC model

sample basis.

Primarily associated with the resonances ofthe vocal tract.

Occurs within one itch eriod.

-

-20

from vocal tract shapeShort term predictor removes the short-termcorrelation and results in a glottal excitationsignal

-60 d B

Long term correlation refers to thepredictability of the signals based on sampleswhich do not immediately precede the currentone.

-100

-Primarily associated with the quasi-periodicalnature of voice production (periodicity of vocalcords vibration).

LTP (Long time prediction): try to reduce

0 500 1000 1500 2000 2500 3000 3500 4000-140

-120 Frequency periodicities fromharmonics of Pitch frequency

basic periodicity or pitch that causes awaveform that more or less repeats

Occurs across consecutive pitch periods

er zLong-term/pitch prediction removes thecorrelation across consecutive periods.

Page 40: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 40/54

The residue is down sampled by afactor of 3.

energy is selected, quantized, andtransmitted.

-

Encode remainder with 3 bits/sample

Page 41: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 41/54

egu ar pu se exc e - ong erm re c on- nearpredictive Coder

RPE-LTPInput:

A/D converted signal

Coded @ 13bit/sample= 104 kbps

= 2080 bitsDivided by 13 bit/sample

=160 sam les

Bits per 20 ms

Long Term Prediction (LTP) filter 9

Excitation Signal 188

Total: 260 bi ts/20 ms = 13 kbps

Page 42: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 42/54

-

Linear Prediction Coding (LPC) filter 8 parameters 36

Long Term Prediction (LTP) filter Nr (delay)br (gain)

27

288

Excitation Signal Sub-sampling phase

Maximum amplitude13 samples

2

639

8

24156

Page 43: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 43/54

Dividing the speech into 20 ms frames (160 samples each)

leads to a signal delay of 20 ms.In addition there is additional delay due to mathematicalcalculations.

In total, the overall delay is approximately 90 ms.In du lex communications the dela starts to be noticebale and affects the communication quality when it reaches 300to 500 ms.

Therefore, 90 ms are tolearble.

Page 44: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 44/54

a rate6.5 kb/s used to increase capacity since

it only needs a half rate TCH.Speech quality is considerably less.

GSM Full Rate

Card games are fun to play

n ance u a e12.2 kb/s but further protected by CRCin a resulting 13 kb/s codec.

GSM Half RateGSM EFR

e er speec qua y.

The codec adapts to the radioconditions.

is better protected.

Page 45: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 45/54

-

The philosophy behind

AMR is to lower the codecrate as the interferenceincreases and thusenabling more error

.

The AMR codec is alsoused to harmonize the

different cellular systems.

Page 46: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 46/54

-

arrow an e

• Audio bandwidth equal to 300-3400 Hz.

• peec qua y ower an e qua y o w re ne commun ca ons.

• Most spectrum efficient codec among the speech codecs of GSM.

Wideband AMR (GERAN Rel’5)• Audio bandwidth extended to 50-7000 Hz.

• The extension improves intelligibility and naturalness of speech.

• The quality of the highest codec modes exceeds the quality of 64 kbit/sPCM speech.

• High quality means more bits and reduced network capacity.

Page 47: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 47/54

• Assume that previously synthesizedframes are highly correlated

optimum delay• Scale to fit

....using only the adaptive excitation,

• Transmit delays and gains

the synthesis is still decent....

Page 48: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 48/54

o er an eco er ave a pre e erm ne co e oo o exc a on s gna s.

Index of the code book where the best match was found is transmitted.

The receiver uses this index to pick the correct excitation signal for itssynthesizer filter

Page 49: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 49/54

The listening room....

Take the opportunity to listen and compare....

The original (128 kbit/s) An unstable formant filter hasthis effect....

Different codecs: This is the result of removingthe algebraic codebook....

Normal precision (12,25 kbit/s) ....and if we remove theadaptive codebook....

Low precision (10 kbit/s) use white noise excitaioninstead of the codebooks....

Page 50: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 50/54

Speech coder implements Voice Activity Detection(VAD) – .

When IDLE, do not transmit – e uce a ery consump on. – Reduced interference.

ece ver s e: s ence s s ur ng – Missing received frames replaced with “comfort noise”.

– . – And periodically (480ms) transmitted in special frame (SID= Silence

Descriptor).

Page 51: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 51/54

Voice ActivityDetection (VAD) VAD

Speech frameVAD = 1

DTX

Voice

peec o er DiscontinuousTransmissions

Voiceframe Coder

Comfort Noise VAD = 0

SIDframe

(Silence parameters)

Page 52: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 52/54

BAD FrameReplacement

BFI

Voice, 13bit,8000 samples/s

Voice Decoder DiscontinuousTransmissionsVoice

frame

Comfort NoiseSynthesizer SIDframe

Bad Frame Indication (BFI)Silence Descriptor (SID)

Page 53: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 53/54

DTX coder DTX encoder

The coder will detect if thesample is voice or “silence”.

Correct voice frames are passed

directly to the voice decoder. activity detection will set aVAD bit to 1.

Correct SID frames are passed tothe comfort noise synthesizer.

The voice coder will output avoice frame of 260 bits or aSID frame of 35 bits.

If the Bad Frame Indicator (BFI) isset then

Depending on the VAD bit theDTX will output voice frames

– replaced by the previous

– an incorrect SID frame is.

The DTX will pass the voiceframes or in band encoded SID

rep ace y e as vaframe or last valid speechframe (that probably contains

frames to the channel coder. noise).

Page 54: 04 Source Coding

8/18/2019 04 Source Coding

http://slidepdf.com/reader/full/04-source-coding 54/54

.

Vocoding: Extraction of the vocal tract parameters.

Vocal tract parameters & residual signal are sentinstead of the signal i tself. which are sent to thereceiver.

Signal is regenerated at the receiver.