in - semantic scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre...

26

Upload: others

Post on 12-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Audio Compressionby: Philipp HergetSu�ciency Course Sequence:Course Number Course Title TermHI1341 Introduction to Global History A92HI2328 History of Revolution in the 20th Century B92MU1611 Fundamentals of Music I A93MU2611 Fundamentals of Music II B93MU3611 Computer Techniques in Music C94Presented to: Professor BianchiDepartment of Humanities & ArtsTerm B, 1996FWB5102Submitted in Partial Ful�llmentof the Requirements ofthe Humanities & Arts Su�ciency ProgramWorcester Polytechnic InstituteWorcester, Massachusetts

Page 2: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

AbstractThis report examines the area of audio compression and its rapidly expanding usein the world today. Covered topics include a primer on digital audio, discussion ofdi�erent compression techniques, a description of a variety of compressed formats, andcompression in computers and Hi-Fi stereo equipment. Information was gathered on amultitude of di�erent compression uses.

Page 3: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Contents1 Introduction 12 Digital Audio Basics 23 Compression Basics 73.1 Lossless vs. Lossy Compression : : : : : : : : : : : : : : : : : : : : : : : : : 73.2 Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : : : : : : 93.3 Common Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : 104 Uses of Compression 174.1 Compression in File Formats : : : : : : : : : : : : : : : : : : : : : : : : : : : 184.2 Compression in Recording Devices : : : : : : : : : : : : : : : : : : : : : : : 195 Conclusion 22Bibliography 23

i

Page 4: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

1 IntroductionThe �rst form of audio compression came out in 1939 when Dudley �rst introduced theVOCODER (VOice CODER) to reduce the amount of bandwidth needed to transmit speechover a telephone line (Lynch, 222). The VOCODER broke speech down into certain fre-quency bands, transmitted information about the amount of energy in each band, and thensynthesized speech using the transmitted information on the receiving end of the device. Sincethen, there has been a great deal of research conducted in the area of audio compression. Inthe 1960's, compression was used in telephony, and extensive research was done to minimizebandwidth needed to transmit audio data (Nelson, 313). Today, audio compression is a largesubarea of Audio Engineering.The need for audio compression is brought about by the tremendous amount of spacerequired to store high quality digital audio data. One minute of CD quality audio datatakes up 4Mbytes of storage space (Ratcli�, 32). The use of compression allows a signi�cantreduction in the amount of data needed to create audio sounds with usually only a minimal lossin the quality of the audio signal. Compression comes at the expense of the extra hardware orsoftware needed to compress the signal. However, in todays technologically advanced times,this cost is usually small compared to the cost of space that is saved.Compression is used in almost all new digital audio devices on the market, and in many ofthe older ones. Some examples are the telephone system, digital message recorders, like thosein answering machines, and Sony's new MiniDisc player. With the use of compression, thesedevices are able to store more information in less space. Compression is accompanied by aloss in quality, but usually so minimal it cannot be heard by most people. A good exampleof this is the anti-shock mechanism found in the newer CD players. This mechanism uses asmall portion of digital memory to bu�er digital data from the CD. When a physical shockdisrupts the player and it can no longer read data from the CD, the data from the memorybu�er is used to generate the audio signal until the player re-tracks on the CD. To store amaximum amount of data, the player uses compression to store the data in the memory. The1

Page 5: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Panasonic SL-S600C has such an anti-shock mechanism with 10 seconds of storage bu�er.The Panasonic SL-S600C Operating Instructions state:The extra anti-shock function incorporates digital signal compression technology.When listening to sound with the unit connected to a system at home, it isrecommended that the extra anti-shock switch be set to the OFF position.The recommendation is given because the compression algorithm used in the storage has aslightly detrimental impact on the sound quality.The use of audio compression is a tradeo� among di�erent factors. Knowledge of audiocompression is useful not only to the designer, but also the consumer. The key questions thatarise in the evaluation of an audio compression systems are how much is the data compressed,what are the losses associated with the compression, and what is the cost of the compression.This paper will answer some of these questions by providing a basic awareness of compression,giving background on compression, explaining various popular compression techniques, anddiscussing the compression formats used in various audio devices and audio computer �les.2 Digital Audio BasicsCompression can be accomplished using two di�erent methods. The �rst method is to takethe data from a standard digital audio system and compress it using software. The second isto encode the signal in a di�erent yet similar manner to that done in a normal digital audiosystem. Both of these methods are based on digital audio theory, therefore, the understandingof their functionality and performances requires an understanding of digital audio basics.The sounds we hear are caused by variations in air pressure which are picked up by ourear. In an analog electronic audio system, these pressure signals are converted to a electricvoltage by a microphone. The changing voltage, which represents the sound pressure, isstored on a medium (like tape), and later used to control a speaker to reproduce the originalsound. The largest source of error in such an audio system occurs in the storage and retrievalprocess were noise is added to the sound. 2

Page 6: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Voltage (Air Pressure)

timeFigure 1: An Example of an Analog WaveformThe idea behind a digital system is to represent an analog (continuous) waveform as a�nite number of discrete values. These values can be stored in any digital media, such as acomputer. Later, the values can be converted back to an analog audio signal. This methodis advantageous over the older analog techniques because no information (quality) is lost inthe storage and retrieval process. Also unlike analog, when a copy of a digital recording ismade, the values can be exactly duplicated, creating an exact replica of the original digitalwork. However, the process does su�er other losses. These losses occur in the conversionprocess from the analog to the digital format.To explain the analog to digital conversion process, we will look at an analog audiowaveform and show each of the steps taken in digitizing it. The waveform in Figure 1represents a brief moment of an audible sound. The amplitude of the waveform representsthe relative air pressure due to the sound.In a digital system, the waveform is represented by a series of discrete values. To getthese values, two steps must be taken. First the signal is sampled. This means that discretevalues of the signal are selected in time. The second step is to quantize each of the valuesattained in the sampling step. Quantization reduces the amount of storage space required foreach value in a digital system.In the �rst step, the samples are taken at constant intervals. The number of samples3

Page 7: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

time

Voltage

TFigure 2: An Example of a Sampled Analog Waveformtaken every second is called the sampling rate. Figure 2 shows the result of sampling thesignal. The X's on the waveform represent the samples which were taken. Since the sampleswere taken every T seconds, there are 1=T samples per second. The sampling rate shownin Figure 2 is therefore 1=T samples/s. Typical sampling rates range from 8000 to 44100samples/s for a CD. The term samples/s is often replaced by the term Hz, kHz, or MHz torepresent units of samples/s, kilo samples/s, or Mega samples/s respectively (Audio FAQ).The sample values, the values with the X's, now represent the original waveform. Thesevalues could now be stored, and be used at a later time to recreate the original signal. Howwell the original signal can be recreated, is related to the number of samples taken in a giventime period. Therefore, the sampling rate is a critical factor in the quality of the digitizedsignal. If too few samples are taken, then the original signal cannot be re-generated correctly.In 1933, a publication by Harry Nyquest proved that if the sampling rate is greaterthat twice the highest frequency of the original signal, the original signal can be exactlyreconstructed (Nelson, 321). This means that if we sample our original signal at a rate thatis twice as high as the highest frequency contained in the signal, there will be no theoreticallosses of quality. This sampling rate, necessary for perfect reconstruction, is commonlyreferred to as the Nyquest rate.Now that we have a set of consecutive samples of the original signal, the samples need4

Page 8: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

time

Voltage

TFigure 3: An Example of a Quantization of the A Sampled Analog Waveformto be quantized in order to reduce the storage space required by each sample. The processinvolves converting the sampled values into a certain number of discrete levels, which arestored as binary numbers. A sample value is typically converted to one of 2n levels, where nis the number of bits used to represent each sample digitally. This process is carried out inhardware by a device called an analog to digital converter (ADC).The result of quantizing the values from Figure 2 is shown in Figure 3. The samples stillhave approximately the same value as before, but have been \rounded o�" to the nearest of16 di�erent levels. In a digital system, the amount of storage space required by a numberis governed by the number of possible values that number could have. By quantizing thesample, the number of possible values is limited, signi�cantly reducing the required storagespace. After quantizing the value of each sample in the �gure to one of 24 levels, only 4 bitsof storage are needed for each sample. In most digital audio systems, either 8 or 16 bits areused for storage, yielding 28 = 256 or 216 = 65536 di�erent levels in the quantization process.The quantization process is the most signi�cant source of error in a digital audio signal.Each time a value is quantized, the original value is lost, and the value is replaced by anapproximation of the original. The peak value of the error is 1=2 the value of the quantizationstep. Thus the smaller the quantization steps, the smaller the error is. This means the more5

Page 9: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

time

Voltage

TFigure 4: An Example of a Signal Reconstructed from the Digital Databits used to quantize the signal, the better the quality of reconstructed sound signal, and themore space required to store the signal values.To regain the original signal, each of the values stored as the digital audio signal areconverted back to an analog audio signal using a Digital to Analog Converter (DAC). Anexample of the output of the DAC is shown in Figure 4. The DAC takes the sample points andmakes an analog waveform out of them. Due to the process used to convert the waveform,the resulting signal is comprised of a series of steps. To remedy this, the signal is then putthrough a low pass �lter which smoothes out the waveform, removing all of the sharp edgescaused by the DAC. The resulting signal is very close to the original.All the losses in the digital system occur in the conversion process to and from a digitalsignal. Once the signal is digital, it can be duplicated, or replayed any number of times andnever lose any quality. This is the advantage of a digital system. The losses generated bythe conversion process can be measured as a Signal to Noise Ratio (SNR), the same measureused for analog signals. The noise in the signal is considered to be the signal that would haveto be subtracted from the reconstructed signal to obtain the original. SNR is used to comparethe quality of di�erent types of quantization, and is also used in the quality measurement ofcompression techniques. 6

Page 10: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

3 Compression BasicsThe underlying idea behind data compression is that a data �le can be re-written in a di�erentformat that takes up less space. A data format is called compressed when it saves eithermore information in the same space, or saves information in less space than a standarduncompressed format. A compression algorithm for an audio signal will analyze the signaland store it in a di�erent way, hopefully saving space. An analogy could be made betweencompression and shorthand. In shorthand, words are represented by symbols, e�ectivelyshortening the amount of space occupied. Data compression uses the same concept.3.1 Lossless vs. Lossy CompressionThe �eld of compression is divided into two categories, lossless and lossy compression. Inlossless compression, no data is lost in the compression process. An example of a losslesscompression program is pkzip for the IBM PC. This is a shareware utility which is widelyavailable. It can be used to compress and uncompress any type of computer �le. When a �leis uncompressed, the exact original is retrieved. The amount of compression that is achievedis highly dependent on the type of �le, and varies greatly from �le to �le.In lossy compression schemes, the goal is to encode an approximation of the original.By using a close approximation of the signal, the coding can usually be accomplished usingmuch less space. Since an approximation is saved, instead of the original, lossy compressionschemes can only be used to compress information when the exact original is not needed.This is the case for audio and video data. With these types of data, any digital format usedis an approximation of the original signal. Compression used in computer data or program�les must be compressed using lossless compression because all of the data is usually critical.In general, lossy compression schemes yield much higher compression ratios than losslesscompression schemes. In many cases, the di�erence in quality between the compressedversion and the original is so minimal that it is not noticeable. Yet, in other compression7

Page 11: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

schemes there is a signi�cant di�erence in quality. Deciding what how much information isto be lost is up to the discretion of the designer of the algorithm or technique. It is a tradeo�between size and quality.If the shorthand writer, from the previous analogy, was to write only the main idea's of thetext down, it would be analogous to lossy compression. Using only the main ideas would bean extreme form of compression. If he or she were to leave out some adjectives and adverbs,it would again be a form of lossy compression. This one being less lossy than the �rst. Fromthe analogy, it can be seen how the writer (programmer) can decide how important the detailsare and how many details to include.Almost all compression techniques used in digital systems are lossy. This is becauselossless compression algorithms are generally very unpredictable in the amount of compres-sion they can achieve. In a typical application, there is a limited amount of \space" for thedigital audio data that is generated. If the audio data cannot be compressed to a guaranteedsize, it simply will not �t in the required space, which is unacceptable.The reason for the unpredictability of a lossless technique lies in the technique itself. Datawhich happens to be in a format which does not lend itself to the way the lossless technique\re-writes" the data will not be compressed. In The Data Compression Book, Mark Nelsoncompares raw speech �les which were compressed with a shareware lossless data compressionprogram, ARJ, to demonstrate how well a typical lossless compression scheme will compressan audio signal. He states:ARJ results showed that voice �les did in fact compress relatively well. The sixsample raw sound �les gave the following results:Filename Original Compressed RatioSAMPLE-1.RAW 50777 33036 35%SAMPLE-2.RAW 12033 8796 27%SAMPLE-3.RAW 73019 59527 19%SAMPLE-4.RAW 23702 9418 60%SAMPLE-5.RAW 27411 19037 30%SAMPLE-6.RAW 15913 12771 20%8

Page 12: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

His data shows that the compression ratios uctuate greatly depending on the particularsample of speech that is used.3.2 Audio Compression TechniquesFor any type of compression, the compression ratio and the algorithm used is highly depend-ent on the type of data that is being compressed. The data source used in this paper is audiodata, and we have already determined that lossy compression will be used in most cases.Now we can further subdivide the source into music and voice data.The more information that is known about the source, the better the compression tech-nique can be tailored toward that type of data. The di�erences between music and speechallow audio compression techniques to be subdivided into two categories: waveform codingand voice coding. Waveform coding can be used on all types of audio data, including voice.The goal of waveform coding is to recreate the original waveform after decompression. Thecloser the decompressed waveform is to the original, the better the quality of the codingalgorithm is. The second technique, voice coding, yields a much higher compression ratio,but can only be used if the audio source is a voice. In voice coding, the goal is to recreate thewords that were spoken and not the actual voice. The algorithms \utilize priori informationabout the human voice, in particular the mechanism that produces it" (Lynch, 255).Since the two techniques are fundamentally di�erent, the performance of each techniqueis measured di�erently. The performance of waveform coding techniques are measured bydetermining how well the uncompressed signal matches the original speech waveform. Thisis usually done by measuring the SNR. With the voice coding technique this is not possiblesince the technique doesn't try to mimic the waveform. Therefore, in voice coding algorithms,the quality of the algorithm is measured by listener preference.These coding techniques can be further subdivided into two categories, time domaincoding and frequency domain coding. In a time domain coding technique, information on eachof the samples of the original signal are encoded. In a frequency domain coding technique,9

Page 13: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

the signal is transformed into it's frequency representation. This frequency representation isthen encoded into a compressed format. Later the information is decoded, and transformedback into the time representation of the signal to get back the original samples. Most simplecompression algorithms use a time domain coding technique.The more recent waveform coding techniques provide a much higher compression ratio byusing psychoacoustics to aid in the compression. Psychoacoustics is \the study of how soundsare heard subjectively and of the individual's response to sound stimuli" (Webster's NewWorld Dictionary, 1147). By basing the compression scheme on psychoacoustic phenomenon,data that can't be heard by humans can be discarded. For example, in psychoacoustics it hasbeen determined that certain levels of sounds cannot be heard while other louder sounds arepresent (Beerends, 965). This e�ect is called masking. By eliminating the unheard soundsfrom the audio signal, the signal is simpli�ed, and can be more easily compressed. Techniqueslike these are used in modern systems where high compression ratios are necessary, like Sony'snew MiniDisc player.3.3 Common Audio Compression TechniquesThe techniques that have been discussed thus far are general subcategories of the approachesthat can be taken when designing an audio compression algorithm. In this section, the detailsof some popular compression techniques will be discussed. Since compression is such a largearea, a comprehensive guide to all the di�erent compression methods is far beyond the scopeof this paper. However, this section covers some fundamental and some advanced techniquesto provide a general idea of how di�erent compression techniques are implemented.To give a general background, both waveform and voice coding techniques are discussed.Since the waveform coding techniques are simpler, they will be discussed �rst. In thesetechniques, the compressed digital data is often obtained from the original signal itself, ratherthan creating standard digital audio data and compressing it with software.10

Page 14: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

3.3.1 Waveform Coding TechniquesPCMPulse Code Modulation (PCM) refers to the technique used to code the raw digital audiodata as described in Section 2. It is the fundamental digital audio technique that is usedmost frequently in digital audio systems. Although PCM is not a compression technique,when it is used along with non-uniform quantization such as �{Law or A{Law, it can beconsidered compression. PCM combined with non-uniform quantization is used as a referencefor comparing the performance of other compression schemes (Lynch, 225).�{Law and A{Law CompandingSince the dynamic range of an audio signal is very wide, an audio waveform having a maximumpossible amplitude of 1 volt may never reach over 0.1 volts if the audio signal is not veryloud. If the signal is quantized with a linear scale, the values attained by the signal willcover only 1/10 of the quantization range. As a result, the softer audio signals have a verygranular waveform after being quantized, and the quality of the sound deteriorates rapidlyas the sound gets softer. To compensate for the wide dynamic range of audio signals, a non-linear scale can be used to quantize the signal. Using this method, the digitized signal willhave an increased number of steps in the lower range, alleviating the problem (Couch, 152).Using non-uniform quantization can raise the SNR for a softer sound, making the SNR fora wide range of sound levels approximately uniform (Couch, 155). Typically, non-uniformquantization is done on a logarithmic scale.The two standard formats for the logarithmic quantization of a signal are �{Law andA{Law. A{Law is the standard format used in Europe (Couch, 153), and �{Law is used inthe telephone systems of the United States, Canada, and Japan. The �{Law quantization,used in phone systems, uses eight bits of data to provide the dynamic range that normallyrequires twelve bits of PCM data (Audio FAQ).The process of converting a computer �le to �{Law is a form of compression, since the11

Page 15: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

amount of data that is needed per sample is reduced and the dynamic range of the sampleis increased. The result is much less data with more information. To create �{Law or A{Law data, the signal must be originally be compressed and later expanded. This process iscommonly referred to as companding.Silence CompressionSilence compression is a form of lossless compression that is extremely easy to implement.In silence compression, periods of relative silence in a audio signal are replaced by actualsilence. The samples of data that were used to represent the silent part are replaced by acode and a number telling the device which reconstructs the analog signal how much silenceto insert. This reduces all of the data needed to represent the silent part of the signal downto a few bytes.To implement this, the compression algorithm �rst determines if the audio data is silentby comparing the level of the digital audio data to a threshold. If the level is lower than thethreshold, that part of the audio signal is considered silent, and the samples are replaced byzeros. The performance of the algorithm therefore hinges on the threshold level. The higherthe level, the more compression there is but the more lossy the technique is. The amount ofcompression achieved also depends on the total length of all the silent periods in an audiosignal. The amount can be very signi�cant in some types of audio data like voice data.Silence encoding is extremely important for human speech. If you examine awaveform of human speech, you will see long, relatively at pauses between thespoken words. (Ratcli� 32)In The Data Compression Book, Mark Nelson wrote silence compression code in C, andused it to compress some PCM audio data �les. The results he obtained were as follows:Filename Original Compressed RatioSAMPLE-1.RAW 50777 37769 26%SAMPLE-2.RAW 12033 11657 3%SAMPLE-3.RAW 73019 73072 0%SAMPLE-4.RAW 13852 10962 21%SAMPLE-5.RAW 27411 22865 17%12

Page 16: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

a)

b)Figure 5: An Example of Signals in a DM waveform: a) The original and reconstructedwaveforms and b) The DM waveformThe table indicates that silence compression can be very e�ective in some instances, but inothers it may have no e�ect at all, or even increase the �le size slightly. Silence compressionis used mainly in �le formats found in computers.DMDelta Modulation (DM) is one of the most primitive forms of audio encoding. In DM, astream of 1 bit values is used to represent the analog signal. Each bit contains informationon whether the DM signal is greater or less than the actual audio signal. With this information,the original signal can then be reconstructed.Figure 5 shows an example DM signal, the original signal it was generated from, and thereconstructed signal before �ltering. The actual DM signal, Figure 5b, contains informationon whether the output should rise or fall. The size of the step and the rate of the steps are�xed. The reconstruction algorithm simply raises or lowers the input value according to theDM waveform.DM su�ers from two major losses, granular noise and slope overload. Granular noiseoccurs when the input signal is at. The DM signal simulates at regions by rising andfalling, leading to granular noise. Slope overload is caused when the input signal rises faster13

Page 17: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

than the DM signal can follow it. Granular noise can be eliminated by making the step sizesmall enough, and slope overload can be prevented by increasing the data rate. However,decreasing the step size and increasing the data rate, also increases the amount of dataneeded to store the signal. DM is rarely used, but was explained here to provide a basis forunderstanding ADM, which o�ers a signi�cant advantage over PCM.ADMAdaptive Delta Modulation (ADM) is the solution to the problems with DM. In ADM, thestep size is continuously adjusted, making the step size larger in the fast changing parts ofthe signal and smaller in the slower changing parts of the signal. Using this technique, boththe granular noise and the slope overload problems are solved.In order to adjust the step size, an estimation must be made to determine if the signal ischanging rapidly. The estimation in ADM is usually based on the last sample. If the signalincreased for two consecutive samples, the step size is increased. If the two previous stepswere opposite in direction, then the step size is decreased. This estimation method is simpleyet e�ective.The performance of ADM using the above technique turns out to be better than Log PCMwhen little data is used to represent a signal1. When more data is used however, Log PCMperforms better (Lynch 229).DPCMA Di�erential Pulse Code Modulation (DPCM) system consists of a predictor, a di�erencecalculator, and a quantizer. The predictor predicts the value of the next sample. Thedi�erence calculator then determines the di�erence between the predicted value and the actualvalue. Finally, this di�erence value is quantized by the quantizer. The quantized di�erencesare used to represent the original signal.1Performance is measured with SNR. 14

Page 18: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Essentially, a DM signal is a DPCM signal with one bit being used in the quantizationprocess and a predictor based on the previous bit. In a DM system, the predicted valueis always the same as the previous value and the di�erence between the predicted value(previous value) and the actual signal is quantized with using one bit (two levels).The performance of a DPCM signal depends on the predictor. The better it can predictwhere the signal is headed, the better it will perform. A DPCM system using one previousvalue in the predictor can achieve the same SNR as a �{Law PCM system using one less bitto quantize each sample value. If three previous values are used for the predictor, the sameSNR can be achieved using two bits less to represent each sample (Lynch 227). This is asigni�cant performance increase over PCM because it obtains the same SNR using less data.This technique can be extended even further by making the prediction method adaptive to theinput data. The technique is called Adaptive Di�erential Pulse Code Modulation (ADPCM).ADPCMADPCM is a modi�cation of the DPCM technique making the algorithm adapt to the char-acteristics of the signal. The relationship between DM and ADM is the same as that betweenDPCM and ADPCM. In both of these, the algorithm is made adaptive to the changes in theaudio signal. The adaptive part of the system can be built into the predictor, the quantizer,or both, but has been shown to be most e�ective in the quantizer (Lynch 227).Using this adaptive algorithm, the compression performance can be increased beyond thatof DPCM. \Cohen (1973) shows that by using the two most signi�cant bits in the previousthree samples, a gain in SNR of 7dB over non-adaptive DPCM can be obtained" (Lynch,227). Di�erent forms of ADPCM are used in many applications including inexpensive digitalrecorders. Also, ADPCM is used in public compression standards which are slowly gainingpopularity, like CCITT G.721 and G.723, which used ADPCM at 32 kbits/s and 24 or 40kbits/s respectively (Audio FAQ). 15

Page 19: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

PASC and ATRACAll of the previously mentioned compression techniques are a relatively simple re-writingof the audio data. Precision Adaptive Subband Coding (PASC) and Adaptive TRansformAcoustic Coding (ATRAC) di�er from these, because they are much more complex propri-etary schemes which were developed for a speci�c purpose. PASC and ATRAC were bothdeveloped for used in the Hi-Fi audio market. PASC was developed by Philips for use withthe Digital Compact Cassette (DCC), and ATRAC was developed by Sony for use with theirMiniDisc player. Both of these techniques use psychoacoustic phenomena as a basis for thecompression algorithm in order to achieve the extreme compression ratios required for theirapplications.The details of the algorithms are complicated, and will not be discussed here. Moreinformation is given in the discussion of compression used in Hi-Fi audio equipment inSection 4.2. In addition to this, details on PASC can be found in Advanced Digital Audioby Ken Polmann, and details on ATRAC can be found in the Proceedings of the IEEE in anarticle titled, \The Rewritable MiniDisc System" by Tadao Yoshida.3.3.2 Voice Coding TechniquesLPCLinear Predictive Coding (LPC) is one of the most popular voice coding techniques. Inan LPC system, the voice signal is represented by storing characteristics about the systemcreating the voice. When the data is played back, the voice is synthesized from the stored databy the playing device. The model used in an LPC system includes the source of the sound,a variable �lter resembling the human vocal tract, and an variable ampli�er resembling theamplitude of the sound.The source of the sound is modeled in two di�erent ways depending on how the voice isbeing produced. This is done because humans can produce two types of sound, voiced andunvoiced. Voiced sounds are those which are created by using the vocal cords and unvoiced16

Page 20: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

sounds are created by pushing air through the vocal tract. An LPC algorithm models thesesounds by using either driven periodic pulses (voiced) or a random noise generator (unvoiced)as the source.The human vocal tract is modeled in the system as a time-varying �lter (Lynch, 240).Parameters are calculated for the �lter to mimic the changing characteristics of the vocaltract when the sound was being produced. The data used to represent the voice in an LPCalgorithm consists of the information on the �lter parameters, the source used (voiced orunvoiced), the pitch of the voice, and the volume of the voice. The amount of data generatedby storing these parameters is signi�cantly less than the amount of data used to representthe waveform of the speech signal.GSMThe Global System for Mobile telecommunications (GSM) is a standard used for compressionof speech in the European digital cellular telephone system. GSM is an advanced compressiontechnique that can achieve a compression ratio of 8:1. To obtain this high compression ratioand still produce high quality sound, GSM is based on the LPC voice coding technique andalso incorporates a form of waveform coding (Degener, 30).4 Uses of CompressionCompression is used in almost all modern digital audio applications. These devices includecomputer �les, audio playing devices, telephony applications, and digital recording devices.Many of the devices, like the telephone system, have been using compression for many yearsnow. Others have just recently started using it. The type of compression that is used dependson cost, size, space, and many other factors.After reviewing a basic background on compression, one question remains unanswered:what type of compression is used for a particular application? In the following sections, the17

Page 21: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

uses of compression in two major areas will be discussed: computer �les, and digital hi-�stereo equipment. Knowledge about these areas is particularly useful, because it can help indeciding which device to use.4.1 Compression in File FormatsWhen digital audio technology was �rst appearing on the market, each computer manufac-turer had their own �le format, or formats, associated with their computer (Audio FAQ). Assoftware became more advanced, computers attained the ability to read more than one �leformat. Today, most software can read and write a wide range of �le formats, leaving thechoice to the user.In general, there are two types of �le formats, \raw" and self-describing. In a raw �leformat data can be in any format. The encoding and parameters are �xed and know inadvance to be able to read the �le. The self-describing format has a header in which di�erentinformation about the data type are stored, like sampling rate and compression. The mainconcern here will be with self-describing �le formats, since these are most often used andmost versatile.A disadvantage of using compression in computer �les is that the �le usually needs to beconverted to linear PCM data for playback on digital audio devices. This requires extra codeand processing time. It also may be one of the reasons why approximately half of the �leformats available for computers don't support compression. The following is a chart takenfrom the \Audio Tutorial FAQ" of The Center for Innovative Computer Applications. Itdescribes most of the popular �le formats on the market, and the compression that is used ifany:18

Page 22: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Extension, Name Origin Variable Parameters.au or .snd NeXT, Sun rate, #channels, encoding, info string.aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info.aif(f), AIFC Apple, SGI same (extension of AIFF with compression).i�, IFF/8SVX Amiga rate, #channels, instrument info (8 bits).voc Soundblaster rate (8 bits/1 ch; can use silence deletion).wav, WAVE Microsoft rate, #channels, sample width, lots of info[including compression scheme].sf IRCAM rate, #channels, encoding, infonone, HCOM Mac rate (8 bits/1 ch; uses Hu�man compression)none, MIME Internet [usually 8-bit �{Law compression 8000 samp/s].mod or .nst Amiga [bank of digitized instrument samples withsequencing information]Many of these �le formats are just uncompressed PCM data with the sampling rate andthe number of channels used during recording speci�ed in the header. For the formats that dosupport compression, it is usually optional. For example, in the Soundblaster \.voc" format,silence compression can be used, and in the Microsoft \.wav" format, a number of di�erentencoding schemes can be used including PCM, DM, DPCM, and ADPCM.Conversion from one format to another can be accomplished via software. The \AudioFAQ" also provides information on a number of di�erent programs that will do the conversion.When converting from uncompressed to compressed formats, the �le is generally smallerafterwards, but some quality is lost. If the �le is later converted back, the size will increase,but the quality can never be regained.4.2 Compression in Recording DevicesThere are currently four major digital stereo devices on the market. These are the CompactDisc (CD), the Digital Analog Tape (DAT), the Digital Compact Cassette (DCC), and theMiniDisc (MD). They are all very di�erent from each other. The CD and MD use an opticalstorage mechanism, and the DAT and DCC use a magnetic tape to store the data. There arealso a number of other apparent di�erences between the mediums. For example, a CD is not19

Page 23: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

re-writable while the others are.A major di�erence that may not be apparent, however, is that the MD and DCC utilizedigital data compression while the DAT and CD do not. This allows the MD and DCC to bephysically smaller than their uncompressed counterparts. In both devices, the smaller datasize is necessary and advantageous.In the MD, the design goal was to make the optical disc small so that it would be portable.The MD contains the same density of data as the CD. Only by using compression can the discbe made physically smaller than the CD. In addition to reducing the size, the compressionused gave the MD other advantages. It allowed the MD to be the �rst optical player withthe digital anti-shock mechanism described in the introduction. Since less data is requiredto generate sound and the MD reads at the same speed as the CD, the MD can read moredata than it needs to generate sound. The extra data is stored in a bu�er, which does notneed to be very big. CD's eventually came out with the same technology, but in order toimplement it, the reading speed of the CD needed to be increased, and the data needed tobe compressed after reading to �t it into a memory bu�er.The design goal of the DCC was to make the storage medium inexpensive and the samesize as an audio tape. By doing this, a DCC player could accept standard audio tapes aswell as the new DCC tapes, making it more marketable. To be able to �t the data onto arelatively inexpensive tape medium which can be housed in an audio cassette case, digitalcompression was required.In both the MD and DCC, the space available for digital audio data was approximately 1=4of the size required for PCM data. The compression ratio needed was therefore approximately4:1. To obtain such high compression rates, the compression schemes utilize psychoacousticphenomena.Precision Adaptive Subband Coding (PASC) is the compression algorithm that is usedfor the DCC to provide a 4:1 compression of the digital PCM data. PASC is described inthe book Advanced Digital Audio, edited by Ken Pohlmann:20

Page 24: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

The PASC system is based on three principles. First, the ear only hears soundsabove the threshold of hearing. Second, louder sounds mask softer sounds ofsimilar frequency, thus dynamically changing the threshold of hearing. Similarly,other masking properties such as high- and low-frequency masking may be util-ized. Third, su�cient data must be allocated for precise encoding of sounds abovethe dynamic threshold of hearing.Using PASC, enough digital data can �t onto a medium the size of a cassette to make theDCC player feasible.The MD uses the ATRAC compression algorithm, which is based on the same psy-choacoustical phenomenon. Compression in a MiniDisc is more advanced, however. TheMiniDisc achieves a compression ratio of \5:1 in order to o�er 74 min of playback time"(Yoshida, 1498).Although these algorithms o�er such a high compression, there are some losses that areinvolved. Experts claim that they can hear a di�erence between a CD and a MD, but theactual losses are so minimal that the average person will not hear them. The largest errorsoccur with certain types of audio sounds that the compression algorithm has problems with.In an article in Audio Magazine, Edward Foster writes:Although the test was not double-blind, and thus is suspect, I convinced my-self I could reliably tell the original from the copy|just barely, buy di�erentnonetheless.The di�erences occurred in three areas: A slight suppression of low-level high-frequency content when the algorithm needed most of the available bitstreamto handle strong bass and midrange content, a slight dulling of the attack ofpercussion instruments (piano, harpsichord, glockenspiel, etc.) probably causedby imperfect masking of \pre-echo" and a slight \post-echo" (noise pu�) at thesensation of a sharp sound (such as claves struck in an acoustically dead envir-onment). The second and third of these anomalies were most readily discernibleon single instruments played one note at a time in a quiet environment and weretaken from a recording speci�cally made to evaluate perceptual encoders.Similar e�ects exist when listening to a DCC recording. Although the losses are minimal,they are still present, being the tradeo� of having the small compact portable format.21

Page 25: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

5 ConclusionIn the last decade, the �eld of digital audio compression has grown tremendously. With theexpansion of the electronics industry and the decreasing prices of digital audio, many deviceswhich once used analog audio technology now use digital technology. Many of these digitaldevices use compression to reduce storage space, and bring down cost.Digital audio compression has become a sub-area of Audio Engineering, supporting manyprofessionals who specialize in this �eld. Millions of dollars are invested by companies,such as Sony and Philips, to develop proprietary compression schemes for their digital audioapplications (Audio FAQ).Because of the widespread use of compression, knowledge in this area can be useful.As a musician working with modern digital recording and editing equipment, the study ofcompression can provide an advantage. Knowledge in the �eld of compression can help inthe evaluation and understanding of recording and playback equipment. It can also aid whenmanipulating digital �les with computers. As we move into the next century, and digitalaudio technology continues to grow, the knowledge of audio compression will become anincreasingly valuable asset.22

Page 26: In - Semantic Scholar · pics includ a pr im er on digit al udio, di scuss ion of di eren t compre ss ion t ec hnique s, a d e scr ipt ion of a v ar iet y of compre ss e d form a

Bibliography\Audio tutorial FAQ." FTP://pub/usenet/news.answers/audio-fmts/part[12], Center forInnovative Computer Applications, August 1994.J. G. Beerends and J. A. Stermerdink, \A perceptual audio quality measure based ona psychoacoustic sound representation," AES: Journal of the Audio Engineering Society,vol. 40, p. 963, December 1992.L. W. Couch, Digital and Analog Communication Systems. New York, NY: MacmillanPublishing Company, fourth ed., 1993.J. Degener, \Digital speech compression," Dr. Dobb's Journal, vol. 19, p. 30, December1994.M. Fleischmann, \Digital recording arrives," Popular Science, vol. 242, p. 84, April 1993.E. J. Foster, \Sony MSD-501 minidisc deck," Audio, vol. 78, p. 56, November 1994.D. B. Guralnik, ed., Webster's New World Dictionary. New York, NY: Prentice Hall Press,second college ed., 1986.P. Lutter, M. M�uller-Wernhart, J. Ramharter, F. Rattay, and P. Slowik, \Speech researchwith WAVE-GL," Dr. Dobb's Journal, vol. 21, p. 50, November 1996.T. J. Lynch, Data Compression: Techniques and Applications. New York, NY: Van Nos-trand Reinhold, 1985.M. Nelson, The Data Compression Book. San Mateo, CA: M&T Books, 1992.Panasonic Portable CD Player SL-S600C Operating Instructions.K. C. Pollmann, ed., Advanced Digital Audio. Carmel, IN: SAMS, �rst ed., 1993.J. W. Ratcli�, \Audio compression," Dr. Dobb's Journal, vol. 17, p. 32, July 1992.J. W. Ratcli�, \Examining PC audio," Dr. Dobb's Journal, vol. 18, p. 78, March 1993.J. Rothstein,MIDI: A Comprehensive Introduction. Madison, WI: A-R Editions, Inc., 1992.A. Vollmer, \Minidisc, digital compact cassette vie for digital recording market," Electron-ics, vol. 66, p. 11, September 13 1993.J. Watkinson, An Introduction to Digital Audio. Jordan Hill, Oxford (GB): Focal Press,1994.T. Yoshida, \The rewritable minidisc system," Proceedings of the IEEE, vol. 82, p. 1492,October 1994. 23