part ii (mpeg-4) audio tsbk01 image coding and data compression lecture 11, 2003 jörgen ahlberg

Part IIPart II(MPEG-4) Audio(MPEG-4) Audio

TSBK01 Image Coding and Data Compression

Lecture 11, 2003

Jörgen Ahlberg

2

MPEG-4 Audio - OutlineMPEG-4 Audio - Outline

Psycho-acoustic models

Overview of MPEG-4 Audio

AAC - Advanced Audio Codec

Specialized coders

Synthetic (structured) audio

3

Psycho-acoustic modelsPsycho-acoustic models

A psycho-acoustic model tells how humans perceive the sound.

The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.

4

Hearing ThresholdHearing Threshold

dB

0

10

20

30

40

2 4 6 8 10 12

Will not be heard anyway; discard!

kHz

5

Frequency MaskingFrequency Masking

Energy

Frequency

6

Frequency MaskingFrequency Masking

Energy

Frequency

7

Temporal MaskingTemporal Masking

Energy

TimeStrong sound (”masker”)

Forward (post) maskingApprox. 100 ms

Backward (pre) masking< 10 ms

8

Psycho-acoustic Model: Psycho-acoustic Model: DemoDemo

Music without distortion

Music with white noise

Music with perceptually distributed noise

9

Parts of MPEG-4 AudioParts of MPEG-4 Audio

General natural audio– AAC

BSAC TwinVQ

– HILN (parametric)

Natural speech– CELP– HVXC (parametric)

Synthetic audio– TTS– SAOL– SASL

Composition– Mixing– Re-sampling– 3D-rendering

10

Parts of MPEG-4 Audio Parts of MPEG-4 Audio (cont.)(cont.)

Error Protection– CRC– FEC

Block code Convolution code

– Interleaving

Error Resilience– Error resilient

bitstreams– Error concealment

11

Natural Audio CodersNatural Audio Coders

Quality

Cellular

Telephone

AM

FM

CD

2 4 8 16 32 64 kbit/s

Parametric speech(HVXC)

High quality speech(CELP)

General audio(AAC, TwinVQ)

Parametric audio(HILN)

12

MPEG-2/4 AAC:MPEG-2/4 AAC:Advanced Audio CoderAdvanced Audio Coder

DCT-based time/frequency coder.

Typically 16 – 64 kbit/s/channel.

”Expert listener quality” at 128 kbit/s.

Added to MPEG-2, but without MPEG-4 features.

Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model.

kbits/s kHz Haydn Tracy Chapman

Mono 16 16

Stereo 32 16

Stereo 64 32

13

MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC

TwinVQ (Transform-domain Weighted Interleave)

– Improves performance for low bitrates(6-18 kbit/s).

PNS (Perceptual Noise Substituion)

– Allows coding ”noise-like” parts parametrically.

LTP (Long-term prediction)

– Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.

14

MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC

BSAC (Bit-sliced Arithmetic Coder)

– Adds scaleability to the bitstream.– 16 – 64 kbit/s in steps of 1 kbit/s.

Demo:

60

40

20

kbit/s

15

Other MPEG-4 Natural Other MPEG-4 Natural Audio CodersAudio Coders

Speech coders– High bitrate speech coder (CELP)– Low bitrate speech coder (HVXC)

HILN low bitrate parametric coder– Harmonic and Individual Lines plus Noise– 4 - 16 kbit/s– Subband coder that codes each subband as

a tone or as shaped noise.

16

MPEG-4 High Bitrate MPEG-4 High Bitrate Speech CoderSpeech Coder

High quality CELP coder.

8 or 16 kHz sampling (NB or WB mode).

4 – 24 kbit/s.

PCM (uncompressed) 16 kbit/s 24 kbit/s

Codebook index k

LPC filterPerceptual w. filter

e(n)

gk

xk(n)

s(n)

Basic principle of CELP coder

17

MPEG-4 Low Bitrate MPEG-4 Low Bitrate Speech CoderSpeech Coder

HVXC – Harmonic Vector eXcitation Coder.

8 kHz sampling, 2 – 4 kbit/s.

Down to 1.2 kbit/s in variable rate mode.

Sinusoidal coding for voiced parts and CELP coding for unvoiced part.

HVXC can be combined with HILN.– Automatic switching between the coders– Produces one bitstream.

18

MPEG-4 Natural Audio MPEG-4 Natural Audio Coders: DemoCoders: Demo

Originalaudio

Music coder(TwinVQ)

Music coder(HILN)

Speech coder(CELP)

Speech coder

(HVXC)

6 kbit/s 6 kbit/s 6 kbit/s 2 kbit/s

Speech

Simplemusic

Complexmusic

19

Speed ChangeSpeed Change

Possibility to decode to arbitrary speed, without changing the pitch.

Original

Music ~20% faster

20

Synthetic AudioSynthetic Audio

TTS – Text-To-Speech– MPEG-4 defines an interface, not the TTS itself

SAOL - Structured Audio Orchestra Language– SAOL describes how to generate instruments

SASL - Structured Audio Score Language– SASL describes which instruments to play when– MIDI is a subset of SASL

Demo:– Orchestra: Initially 80 kB instrument descriptions

(SAOL)– While playing: 1 kbit/s (SASL)

21

BIFS –Binary Format for BIFS –Binary Format for Scene DescriptionScene Description

All the sound you hear is coded at 16 kbit/s.

Initial voice coded using TTS.

Current voice coded using parametric speech coder (HVXC).

Background ”music” coded using Structured Audio.

Post-production specified using BIFS, using the Structured Audio tools.

22

A Scene GraphA Scene Graph

AudioMix

AudioFX

AudioSource AudioSource

Mix the sounds

Add reverb

Hand claps(SA decoder)

Speech(CELP-coder)

23

AudioMix

AudioMix AudioFX

AudioDelayAudioFX AudioFX

AudioSource AudioSourceAudioSource

Piano Bass (SA) Finger snaps

That was the last slide!

part ii (mpeg-4) audio tsbk01 image coding and data compression lecture 11, 2003 jörgen ahlberg

Documents