audio coding and mp3 - folk.uio.nofolk.uio.no/inf5080/mkt04a-audio.pdf · audio coding and mp3...
Post on 27-Jul-2018
242 Views
Preview:
TRANSCRIPT
1
Norsk Regnesentral
Audio Coding and MP3
Wolfgang Leistercontributions by:
Torbjørn Ekman
26-Feb-03
Norsk RegnesentralWolfgang Leister
What is Sound?
n Sound waves: 20Hz - 20kHzn Speed: 331.3 m/s (air)n Wavelength: 165 cm - 1.65 cm
2
26-Feb-03
Norsk RegnesentralWolfgang Leister
Analogue audio
n frequencies: 20Hz - 20kHzn mono: x(t) scalarn stereo:
=
)(
)()(
tx
txtx
l
r
26-Feb-03
Norsk RegnesentralWolfgang Leister
Audio Compression
n small files, low data rate at transmissionn reconstruction must be (as much as
possible) equal to original signaln redundancy (lossless coding)n irrelevancy (do not code what you cannot hear)
3
26-Feb-03
Norsk RegnesentralWolfgang Leister
Data rates
Quality Sample Rate Bit/Sample Channels Data Rate kb/s FrequencyTelephone 8.000 8 Mono 64,00 200-3400MW 11.025 8 Mono 88,00UKW 22.050 16 Stereo 705,60CD 44.100 16 Stereo 1411,00 20-20000DAT 48.000 16 Stereo 1536,00 20-20000
26-Feb-03
Norsk RegnesentralWolfgang Leister
Dynamics compression
n A-Law
+⋅+
⋅
≤+⋅
⋅=
else ln1
))(ln(1)(
1for
ln1)(
)('
ASabsA
Ssign
Aabs(S)
ASabsA
SsignS
n µ-Law
255,)1ln(
))(1ln(1)(' =
+⋅++
⋅= µµ
µ SabsSsignS
4
26-Feb-03
Norsk RegnesentralWolfgang Leister
Masking
26-Feb-03
Norsk RegnesentralWolfgang Leister
Masking
n Threshold for human earn Threshold changes:
n neighbouring frequencies(Example 0.5, 1, 4, 8 kHz)
n in time
5
26-Feb-03
Norsk RegnesentralWolfgang Leister
Sampling
• When x(t) is bandwidth limited:
• then
• with
0)( =⇒> fxf ω
[ ]∑∞
−∞=
∆⋅−=n
tntgnxtx )()(
ω211
<=∆sf
t [ ] )( tnxnx ∆⋅= tttg
πωπω
2)2sin()( =
26-Feb-03
Norsk RegnesentralWolfgang Leister
Quantisation
n
n
n
n
)(xQx → tionsrepresenta2Lbits k=⇒k
iji yxQyxyx =⇒−≤− )(
{ }nyy ,,1 K
6
26-Feb-03
Norsk RegnesentralWolfgang Leister
PCM = Pulse Code Modulation
n Sampling:n Quantisation:n Coding:
n Play:
{ } [ ]{ }nxtx →)(
[ ]{ } [ ]( ){ }nxQnx →
[ ]{ }( ) { }innxQ →
( ) [ ]( ) ( )tntgnxQty ii ∆⋅−⋅= ∑
redundancy
irrelevancy
26-Feb-03
Norsk RegnesentralWolfgang Leister
Stereo CD Audio
n Data rate: 1-31044.1bit162 −⋅⋅⋅ s
sbit
102.1411 3⋅=
7
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG compression factors
n MPEG 1 Audio: PCM 32, 44.1, 48 kHz, max 448 kBit/s
n MPEG 2 Audio: PCM 16, 22.05, 24, 32, 44.1, 48 kHz, max 384 KBit/s
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG Audio Layer I,II,III
n Layer In Layer II ⇒ Digital TVn Layer III ⇒ MP3
8
26-Feb-03
Norsk RegnesentralWolfgang Leister
MP3 - MPEG 1 Audio Layer 3
n Sampling: 16 kHz - 48 kHzn Bit rate: 32 kb/s - 192 kb/s
(CD Audio: 44.1 kHz, 1411 kb/s)
n www.iis.fhg.de/amm/gallery/index.htmln Karlheinz Brandenburg: “MP3 and AAC
explained”http://www.exp-math.uni-essen.de/~dreibh/diplom/bra99.pdf
26-Feb-03
Norsk RegnesentralWolfgang Leister
perceptual encoding / decoding
9
26-Feb-03
Norsk RegnesentralWolfgang Leister
Filterbank
26-Feb-03
Norsk RegnesentralWolfgang Leister
Ideal sub-band coder
n impossible: ideal sub-band codern downsampling ⇒ aliasingn possible: “nearly perfect”
=∈
=else0
,,1,ffor 1)(
MmDf m
mHK
10
26-Feb-03
Norsk RegnesentralWolfgang Leister
Downsampling
n from back ton sub-bandwidth B, upper frequency is multiple of B
n can sample at (instead of )
Bfs 2=
sf
BMfs ⋅= 2
sfM ⋅
↓M
[ ] [ ]Mkxky mm ⋅=
[ ]nxm [ ]kym
26-Feb-03
Norsk RegnesentralWolfgang Leister
Filterbank in MPEG-1 audio layer 1-3
n Polyphase filterbankn 32 subbandsn 512 tap FIR-filtersn 80 + and * per output
n Equal widthn Not perfect reconstructionn Frequency overlap
11
26-Feb-03
Norsk RegnesentralWolfgang Leister
A closer look
n The subbands overlap at 3 dB to the adjacent bands.n The leakage to the other bands is small.n The total response almost adds up to one (0 dB).
26-Feb-03
Norsk RegnesentralWolfgang Leister
White noisen The white noise run
through the filterbank.n The samples from each
band are played in the order of the subbands.
n The subsampled filtered sequence.
n The samples from eachband are played in the order of the subbands.
n The reconstruction error is –84 dB.
12
26-Feb-03
Norsk RegnesentralWolfgang Leister
Nonideal filterbanks
n In a perfect filterbankthe first part is the only part.
n The second part consists of the aliasing terms.
n The filterbank is designed so that the aliasing is small.
+=
≈
−
=∑
4444 34444 211
1
0
)()(1
)()( ωωωω jAk
jM
k
Rk
jj eHeHM
eXeY
44444 344444 210
21
0
1
1
2
)()(1
)(
≈
−−
=
−
=
−
∑∑ Mn
jAk
jM
k
Rk
M
n
Mn
jeHeH
MeX
πω
ωπ
ω
26-Feb-03
Norsk RegnesentralWolfgang Leister
Tubthumper, a time domain view
The red line is the reconstruction error after splitting the signal in subbands, down sampling and applying the synthesisfilterbank. The reconstruction error is –84 dB and sounds like
13
26-Feb-03
Norsk RegnesentralWolfgang Leister
Tubthumper, frequency view
Subsampled 32 times
No subsampling
21.710.75.22.41.00.3Center frequency[kHz]
32168421Subband
26-Feb-03
Norsk RegnesentralWolfgang Leister
Filterbank MPEGpolyphase
filterbank
12 samples
band 1
band 2
band 31
...
12 samples 12 samples
Layer I frame
384 samples
Layer II/III frame
1152 samples
14
26-Feb-03
Norsk RegnesentralWolfgang Leister
Critical Bands
n Heinrich Barkhausen (1881-1956)n psycho-acousticn width measured in bark
⋅+
<=
elsef
fforfbark
)1000/log(49
500100/1
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG - Sub bands
n Layer I: 32 bands, 625 Hz each, Fourier transform
n Layer II: 32 bands, three frames, time masking
n Layer III: Division according to critical bands
15
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG masking
n Psycho-acoustic modeln masking of neighbouring bandsn signals are coded when above masking
thresholdn MUSICAM (Masking-pattern adapted
Universal Subband Integrated Coding and Multiplexing)
n Layer I: simplified, Layer II: entirely, Layer III: with other methods
26-Feb-03
Norsk RegnesentralWolfgang Leister
2035
Example: Masking MPEG Audio
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 8 12 10 6 2 10 60 15 2 3 5 3 1
bandlevel
?15? ? ? ? ? ? 12 x ? ? ? ? ? ?masking
?x? ? ? ? ? ? - x ? ? ? ? ? ?coding
16
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG-1 Layer 3 encoder
26-Feb-03
Norsk RegnesentralWolfgang Leister
MP3
n Filter bank - sub bandsn Series MDCTn fine grain frequency resolutionn non-uniform quantisationn perception model n Huffman coding
17
26-Feb-03
Norsk RegnesentralWolfgang Leister
MP3 (vs. Layer I/II)
n modified DCT (Series MDCT vs. FFT)
n critical bandsn Huffman codingn entropy reductionn dynamics compressionn difference and sum of stereo signals
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG Audio Layer I,II,III
n Layer I: 19 ms delay, FFT, 384 samples, frequency masking, equal bands
n Layer II: 35 ms delay, FFT, 1152 samples, frequency masking, time simulated, equal bands
n Layer III: 59 ms delay, DCT, 1152 samples, frequency and time masking, bands as in bark scale
18
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG Layer I, II, III
subj. quality bandwidth compression 1 min audioAudio CD CD 1400 1:1 10.58 MBMPEG1 Layer I CD 384 3.6:1 2.88 MBMPEG1 Layer II CD 256 5.5:1 1.92 MBMPEG1 Layer III CD 128 11:1 962 kBMPEG2 Layer III Radio 64 22:1 481 kBMPEG2 Layer III Telephone 16 88:1 120 kBCS-ACELP Speech 5,30 264:1 40 kB
26-Feb-03
Norsk RegnesentralWolfgang Leister
MPEG-2 AAC
19
26-Feb-03
Norsk RegnesentralWolfgang Leister
Audio Formats
n PCM - Pulse Code ModulationITU G.711; speech data 4kHz bandwidth, 64 kb/s
data rate
n ADPCM (Adaptive Differential PCM)ITU G.726, G.727; 16, 24, 32, 40 kBit/s. Standard
for CCITT G.721
n SB-ADPCM (Sub-Band ADPCM)ISDN, G.722; 7 kHz bandwidth in 64 kBit/s streams
26-Feb-03
Norsk RegnesentralWolfgang Leister
Audio Formats
n AIFF - Audio Interchange File FormatApple (extension from IFF by Electronic Arts)
n Wave (by Microsoft and IBM)Part of RIFF (Resource Interchange File Format)
n NeXT/Sun Audio File Format! big endian
20
26-Feb-03
Norsk RegnesentralWolfgang Leister
Proprietary Audio Formats
n AT&T Proprietary Compression Algorithm
n EPAC (Bell Labs)n Microsoft Windows Media Audio (WMA)n AC-3 Audio Code No. 3 - Dolby Digital
Surround
26-Feb-03
Norsk RegnesentralWolfgang Leister
Speech compression formats
n GSM 06-10: 160 13-bit values in 260 Bit (33 Byte) are compressed; 8000 samples/s result in data rate of 1650 Byte/s
n CELP (Code Excited Linear Prediction): analytical model
n LD-CELP (Low Delay CELP): G.728n LPC-10E (Linear Prediction Coder
(Enhanced): military coder, analytical model, 2.4 kBit/s understandable, but low quality.
top related