part ii (mpeg-4) audio tsbk01 image coding and data compression lecture 11, 2003 jörgen ahlberg
TRANSCRIPT
Part IIPart II(MPEG-4) Audio(MPEG-4) Audio
TSBK01 Image Coding and Data Compression
Lecture 11, 2003
Jörgen Ahlberg
2
MPEG-4 Audio - OutlineMPEG-4 Audio - Outline
Psycho-acoustic models
Overview of MPEG-4 Audio
AAC - Advanced Audio Codec
Specialized coders
Synthetic (structured) audio
3
Psycho-acoustic modelsPsycho-acoustic models
A psycho-acoustic model tells how humans perceive the sound.
The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.
4
Hearing ThresholdHearing Threshold
dB
0
10
20
30
40
2 4 6 8 10 12
Will not be heard anyway; discard!
kHz
5
Frequency MaskingFrequency Masking
Energy
Frequency
6
Frequency MaskingFrequency Masking
Energy
Frequency
7
Temporal MaskingTemporal Masking
Energy
TimeStrong sound (”masker”)
Forward (post) maskingApprox. 100 ms
Backward (pre) masking< 10 ms
8
Psycho-acoustic Model: Psycho-acoustic Model: DemoDemo
Music without distortion
Music with white noise
Music with perceptually distributed noise
9
Parts of MPEG-4 AudioParts of MPEG-4 Audio
General natural audio– AAC
BSAC TwinVQ
– HILN (parametric)
Natural speech– CELP– HVXC (parametric)
Synthetic audio– TTS– SAOL– SASL
Composition– Mixing– Re-sampling– 3D-rendering
10
Parts of MPEG-4 Audio Parts of MPEG-4 Audio (cont.)(cont.)
Error Protection– CRC– FEC
Block code Convolution code
– Interleaving
Error Resilience– Error resilient
bitstreams– Error concealment
11
Natural Audio CodersNatural Audio Coders
Quality
Cellular
Telephone
AM
FM
CD
2 4 8 16 32 64 kbit/s
Parametric speech(HVXC)
High quality speech(CELP)
General audio(AAC, TwinVQ)
Parametric audio(HILN)
12
MPEG-2/4 AAC:MPEG-2/4 AAC:Advanced Audio CoderAdvanced Audio Coder
DCT-based time/frequency coder.
Typically 16 – 64 kbit/s/channel.
”Expert listener quality” at 128 kbit/s.
Added to MPEG-2, but without MPEG-4 features.
Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model.
kbits/s kHz Haydn Tracy Chapman
Mono 16 16
Stereo 32 16
Stereo 64 32
13
MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC
TwinVQ (Transform-domain Weighted Interleave)
– Improves performance for low bitrates(6-18 kbit/s).
PNS (Perceptual Noise Substituion)
– Allows coding ”noise-like” parts parametrically.
LTP (Long-term prediction)
– Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.
14
MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC
BSAC (Bit-sliced Arithmetic Coder)
– Adds scaleability to the bitstream.– 16 – 64 kbit/s in steps of 1 kbit/s.
Demo:
60
40
20
kbit/s
15
Other MPEG-4 Natural Other MPEG-4 Natural Audio CodersAudio Coders
Speech coders– High bitrate speech coder (CELP)– Low bitrate speech coder (HVXC)
HILN low bitrate parametric coder– Harmonic and Individual Lines plus Noise– 4 - 16 kbit/s– Subband coder that codes each subband as
a tone or as shaped noise.
16
MPEG-4 High Bitrate MPEG-4 High Bitrate Speech CoderSpeech Coder
High quality CELP coder.
8 or 16 kHz sampling (NB or WB mode).
4 – 24 kbit/s.
PCM (uncompressed) 16 kbit/s 24 kbit/s
Codebook index k
LPC filterPerceptual w. filter
e(n)
gk
xk(n)
s(n)
Basic principle of CELP coder
17
MPEG-4 Low Bitrate MPEG-4 Low Bitrate Speech CoderSpeech Coder
HVXC – Harmonic Vector eXcitation Coder.
8 kHz sampling, 2 – 4 kbit/s.
Down to 1.2 kbit/s in variable rate mode.
Sinusoidal coding for voiced parts and CELP coding for unvoiced part.
HVXC can be combined with HILN.– Automatic switching between the coders– Produces one bitstream.
18
MPEG-4 Natural Audio MPEG-4 Natural Audio Coders: DemoCoders: Demo
Originalaudio
Music coder(TwinVQ)
Music coder(HILN)
Speech coder(CELP)
Speech coder
(HVXC)
6 kbit/s 6 kbit/s 6 kbit/s 2 kbit/s
Speech
Simplemusic
Complexmusic
19
Speed ChangeSpeed Change
Possibility to decode to arbitrary speed, without changing the pitch.
Original
Music ~20% faster
20
Synthetic AudioSynthetic Audio
TTS – Text-To-Speech– MPEG-4 defines an interface, not the TTS itself
SAOL - Structured Audio Orchestra Language– SAOL describes how to generate instruments
SASL - Structured Audio Score Language– SASL describes which instruments to play when– MIDI is a subset of SASL
Demo:– Orchestra: Initially 80 kB instrument descriptions
(SAOL)– While playing: 1 kbit/s (SASL)
21
BIFS –Binary Format for BIFS –Binary Format for Scene DescriptionScene Description
All the sound you hear is coded at 16 kbit/s.
Initial voice coded using TTS.
Current voice coded using parametric speech coder (HVXC).
Background ”music” coded using Structured Audio.
Post-production specified using BIFS, using the Structured Audio tools.
22
A Scene GraphA Scene Graph
AudioMix
AudioFX
AudioSource AudioSource
Mix the sounds
Add reverb
Hand claps(SA decoder)
Speech(CELP-coder)
23
AudioMix
AudioMix AudioFX
AudioDelayAudioFX AudioFX
AudioSource AudioSourceAudioSource
Piano Bass (SA) Finger snaps
That was the last slide!