analysis of audio compression algorithms sanjeev sharma
TRANSCRIPT
What will be covered?
What are the audio file formats? Why so many? History of the most popular format (MP3) NeoAudio Transcoder MP3 file format explained MP3 Algorithm/ Features/ Issues VQF vs. MP3 Ogg Vorbis vs. MP3
Some Audio Formats
Uncompressed– RIFF: Resource Interchange File Format (Windows)– AIFF: Audio Interchange File Format (Mac)– AU: Audio (Unix)
Compressed– MP3 : MPEG-I/II Layer 3 – VQF: [Transform-domain Weighted Interleave]
Vector Quantization Format– Ogg Vorbis
Why so many formats?
Different hardware/ operating systems need different file structure/ device drivers
– Apple plays AIFF (uncompressed) AIFC (compressed)
– Sun or DEC (Unix) play ‘au’, ‘snd’
– PCs (Windows) play ‘RIFF’/‘wav’ (uncompressed), ‘wma’, ‘wmv’ (compressed)
Why so many formats? (Cont’d)
Several companies came out with their own Proprietary Technologies – InterWave by VocalTec (www.vocaltec.com)– TrueSpeech by DSP Group, Inc (www.dspg.com)– RealAudio by Real Networks (www.real.com)– ToolVox by VoxWare (www.voxware.com)– Perceptual Audio Coder (PAC) by Lucent (
www.lucent.com)
Why so many formats? (Cont’d)
Proprietary Technologies (Cont’d)– Adaptive Transform Audio Coding (ATRAC) by Sony
(http://www.sony.net/Products/ATRAC3)– TwinVQ or VQF from NTT/ Yamaha (
http://www.yamaha-xg.com)– Windows Media Audio by Microsoft (
http://www.microsoft.com/windows/windowsmedia)
Why so many formats? (Cont’d)
Several companies collaborated to define non proprietary open standards – Specification available to all– But different economics involved
In general, MP3 Encoder is not free, has IPR restrictions Ogg Vorbis Encoder is free, and open source
MPEG
Stands for Moving Pictures Experts Group MPEG-1
– First phase, started in 1988, finalized in 1992– Three operating mode with increasing complexity and
performance Layer 1, Layer 2, Layer 3
MPEG-2– Originally (1994) only added two extensions to MPEG-1
Backwards compatible multi-channel coding Coding at lower sampling frequencies
– Later gave up backwards compatibility in favor of Advanced Audio Coding (AAC)
MPEG (Cont’d)
MPEG-3– Created to define High Definition Television (HDTV) video
coding – Later rolled into MPEG-2 itself
MPEG-4– Finished in late 1998– Emphasis on new functionalities rather than compression
efficiency Mobile/ Stationary User Terminal Database Access Communications Interactive Services
MPEG (Cont’d)
MPEG-7– Does NOT define compression algorithm– Content representation standard for multimedia
information search, filtering, management and processing
MPEG Layers
Layer 1– possesses the lowest complexity – specifically targeted to applications where the complexity of
the encoder plays an important role.
Layer 2– requires a more complex encoder as well as a slightly more
complex decoder. – is able to suppress more redundancy in the signal and
applies the psychoacoustic model in more efficient way.
MPEG Layers (Cont’d)
Layer 3– increased complexity – targeted to applications needing the lowest data
rates, by its suppression of the redundant signal and its improved extraction of feebly audible frequencies using its filter
– MP3 stands for MPEG-1/2 Layer 3 and not MPEG-3!!
Personal Car Stereo
Installed Sony CDX-MP450X in my car hoping I would be able to enjoy my MP3’s while driving
Burnt an mp3 CD to play on car stereo (~150 songs) Most of the mp3’s were skipped, only some actually played Investigated to find the difference Turned out that player was able to decode only high bit rate files Installed free software (NeoAudio) on computer to do the
‘transcoding’– Conversion from one sampling rate and/or bit rate to another
‘On the fly’ converted files play, but with ‘clicks’ Intermediate conversion to wav and then transcoding to mp3 gave
perfect results!
Transcoding Options
Choose Encoder Choose MPEG Version Choose Bitrate Choose Mode Choose Quality Choose Samplerate
MP3 File Format
File itself split into frames– One frame is and audio clip of 24 ms at 48 KHz sampling
Each frame has a 4 byte frame header Constant Bit Rate files have similar frame headers Variable Bit Rate (VBR) files have different info in each
frame header– Lower bitrates may be used in frames where it will not affect
quality
MP3 Frame Header
AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM – A - Frame sync (all 11 bits set)– B - MPEG Audio version ID (2 bit)– C - Layer description (2 bit)– D - Protection bit (1 bit)– E - Bitrate index (4 bit)– F - Sampling rate frequency index (2 bit)– G - Padding bit (1 bit)– H - Private bit (1 bit)– I - Channel Mode (2 bit)– J - Mode Extension (2 bit)– K - Copyright (1 bit)– L - Original (1 bit)– M - Emphasis (2 bit)
MPEG Audio version ID (B)
00 - MPEG Version 2.5 (unofficial) 01 - reserved 10 - MPEG Version 2 (ISO/IEC 13818-3) 11 - MPEG Version 1 (ISO/IEC 11172-3)
Bitrate Index (E)
bits V1,L1 V1,L2 V1,L3 V2,L1 V2, L2 & L3
0000 free free free free free
0001 32 32 32 32 8
0010 64 48 40 48 16
0011 96 56 48 56 24
0100 128 64 56 64 32
0101 160 80 64 80 40
0110 192 96 80 96 48
0111 224 112 96 112 56
1000 256 128 112 128 64
1001 288 160 128 144 80
1010 320 192 160 160 96
1011 352 224 192 176 112
1100 384 256 224 192 128
1101 416 320 256 224 144
1110 448 384 320 256 160
1111 bad bad bad bad bad
Sampling rate frequency index (F)
bits MPEG1 MPEG2 MPEG2.5
00 44100 22050 11025
01 48000 24000 12000
10 32000 16000 8000
11 reserv. reserv. reserv.
Padding Bit (G), Private Bit (H)
Padding Bit (G)– 0 - frame is not padded– 1 - frame is padded with one extra slot– Padding is used to fit the bit rates exactly
Private Bit (H)– May be freely used for specific needs of an
application
Channel Mode (I)
00 - Stereo01 - Joint stereo (Stereo)10 - Dual channel (2 mono channels)11 - Single channel (Mono)
Mode Extension (J)
Applicable to Joint Stereo only
Complete frequency range of MPEG file is divided into 32 subbands
For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied.
For Layer III these two bits determine which type of joint stereo is used (intensity stereo or Middle/Side stereo).
Value Layer 1/ 2
Intensity
Stereo
Layer 3
Intensity
Stereo
Layer 3
MS
Stereo
00 Bands 4 to
31
Off Off
01 Bands 8 to
31
On Off
10 Bands 12 to
31
Off On
11 Bands 16 to
31
On On
Copyright (K), Original (L), Emphasis (M)
Copyright (K)– 0 - Audio is not copyrighted– 1 - Audio is copyrighted
Original (L)– 0 - Copy of original media– 1 - Original media
Emphasis (M)– It is used to sort of 're-equalize' the sound after a Dolby-like noise
supression– 00 – none– 01 - 50/15 ms– 10 - reserved– 11 - CCIT J.17
Perceptual Audio Coder (PAC)
Original work attributed to Lucent (http://www.bell-labs.com/org/1133/Research/SpeechAudioCoding/audio.html)
Became the framework of MPEG-2 encoders
MP3 Encoder/ Decoder
AnalysisFilterbank
PerceptualModel
AudioIn
Quantization& Encoding
Encoding ofbitstream
Bitstreamout
Decoding ofbitstream
InverseQuantization
SynthesisFilterbank
AudioOut
BitstreamIn
Decoder
Encoder
MP3 Encoder/ Decoder (Cont’d)
Filter Bank– Encoder decomposes input signal into subsampled spectral
components (time/ frequency domain)– Forms an Analysis/ Synthesis system in combination with the
decoder filterbank Perceptual Model
– For either time domain signal or the analysis filterbank output Computes an estimate of the actual (time and frequency
dependent) masking Uses rules known from psychoacoustics
– Psychoacoustics: Relationship between what arrives at the ear and what we hear
MP3 Encoder/ Decoder (Cont’d)
Quantization and coding– Spectral components are quantized and coded
keeping the quantization noise below the masking threshold
Encoding of bitstream– Bitstream formatter assembles the bitstream– Bitstream consists of
Quantized and coded spectral coefficients Side information like bit allocation information
MPEG Flexibility
Flexibility needed to fit into several applications Flexibility achieved with
– Different Operating Modes Single channel Dual channel (two independent channels) Stereo (no joint stereo coding) Joint stereo
– Different Sampling frequencies 32 KHz, 44.1 KHz, 48 KHz (MPEG-1) Half of above (MPEG-2) ¼ th of MPEG-1 (MPEG-2.5, proprietary Fraunhofer extension)
MPEG Flexibility (Cont’d)
Flexibility achieved with– Different Bit rates
Bitrate defines the compression ratio Min 32 kpbs to Max 320 kbps for MPEG-1 Min 8 kpbs to Max 160 kbps for MPEG-2 Low Sampling
Frequencies extension (LSF) Variable bit rate also possible (each segment has its own bit rate) Sweet spot – 128 Kbps for stereo signal at 48 KHz sampling rate
– Bit rates higher than this, improve quality very slowly– Bit rate lower than this, degrade quality very fast
MP3 Quality
Not all encoders are created equal Quantization and encoding block forms
– Inner control loop to adjust the quantization step with the available Huffman codes (rate loop)
– outer control loop with the perceptual block to keep quantization noise under masking threshold (noise control loop)
Hence encoder needs to be ‘tuned’ for different bitrates
MP3 IPR Issues
MPEG is an open standard But it is informative only The ISO approved standard is based on work by Fraunhofer
Institute, which is protected by several patents. In September 98, Fraunhofer Institute, sent a letter to several
developers of "free" ISO-source based encoders saying that all developers and publishers of MPEG-audio layer 3 (MP3) encoders based on ISO-source must pay a license fee to Fraunhofer.
Fraunhofer joined with Thomson Multimedia (AKA RCA) in order to create a joint patents portfolio: mp3licensing.com
Sample MP3/MP3 Patents
Digital coding process Digital adaptive transformation coding method Process for the detecting of errors in the transmission of
frequency-coded digital signals Process for reducing frequency interlacing during acoustic
or optical signal transmission and/or recording Method for reducing data in the transmission and/or storage
of digital signals of several dependent channels Process for reducing data in the transmission and/or storage
of digital signals of several interdependent channels Etc…etc..
LAME
LAME Ain’t an Mp3 Encoder LAME is an educational tool to be used for
learning about MP3 encoding The goal of the LAME project is to use the
open source model to improve the psycho acoustics, noise shaping and speed of MP3
Free Software?
Several free software like NeoAudio use LAME plug-in, despite the cryptic note on the official homepage (http://www.mp3dev.org)– “Using the LAME encoding engine (or other mp3
encoding technology) in your software may require a patent license in some countries.”
NeoAudio and LAME are open source software under the GNU General Public License
VQF or TwinVQ
Started by NTT/ Yamaha Corp Some claim that VQF produces audio files with better
compression and better sound quality than MP3. Others say, the sound quality of a VQF file is not better
nor worse than a MP3 file, it is just different. Needs more processing power for encoding/ decoding Supported in MPEG-4 Support for VQF has waned as of late
MP3 vs. VQF (Cont’d)
Colors vary from red (peaks in power spectra) to blue and violet (the lowest signal power). - VIBGYOR
MP3 vs. VQF (Cont’d)
1. MP3 psychoacoustic model excludes completely some high frequencies (colored blue) when it decides that they are irrelevant. Clearly, VQF designers have decided not to exclude any part of the spectrum.
2. MP3 preserves power spectra peaks (colored red) very good, but it has its problems with the "green" and "yellow" parts; this can be heard by a careful listener. VQF does not preserve the peaks at the highest frequencies that good, but it beats MP3 at everything else (especially at mid-frequencies).
VQF vs. MP3 (Cont’d)
Conclusion? – It seems that MP3 has a better psychoacoustic
model. – VQF sounds (and looks) more natural.
Ogg Vorbis
Started in 1993 Development picked up in fall, 1998 after Fraunhofer
started asking royalties for MP3 projects Ogg is a container format for audio, video, and
metadata Vorbis is the name of a specific audio compression
scheme that's designed to be contained in Ogg– other formats are capable of being embedded in Ogg
such as FLAC and Speex
MP3 vs. Ogg
Frequencies over 16 KHz are lost in both Cutoff more severe for MP3 around 15 KHz Ogg does maintain, although diminishing,
some of higher frequencies