school of informatics, engineering, & technology cm613 multimedia storage & retrieval...

31
Compression & Strea ming Dr Paul Vickers 1 School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & Streaming Serving, shrinking, and otherwise messing about with perfectly good audio files

Upload: geoffrey-holt

Post on 19-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming

Dr Paul Vickers 1

School of Informatics, Engineering, & Technology

CM613 Multimedia Storage & Retrieval

Compression & Streaming

Serving, shrinking, and otherwise messing about with perfectly good

audio files

Page 2: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 2

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Loudness and power

• Loudness related to force with which a sound presses on your eardrum

• The more power, the louder the sound• Power is proportional to the square of a

sound’s intensity (amplitude, or voltage)

Page 3: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 3

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Sampling error and noise

• CD audio uses 44.1 KHz at 16 bit resolution– Sampled voltages quantised to –32768…32767– Quantisation introduces error (through rounding)– Largest error is 0.5 which is 2-16 times as loud as

the loudest sample value– Power related to square of amplitude so error

has power 2-32 as loud as loudest signal– Ratio of signal to error (noise) is 232:1

• Or 96.3 dB (10 log10(232))

• = SNR of 96 dB

Page 4: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 4

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Signal to noise ratio

• So, CD audio has SNR of 96 dB• 8-bit sampling has SNR of 48 dB• Therefore, 1 bit of resolution adds approx. 6

dB to the dynamic range• Threshold of pain is 120 dB so we need a

20-bit resolution to capture the dynamic range of human auditory system

• Loud samples are rare, so noise is more noticeable than the theory would suggest

Page 5: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 5

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Coding

• A standard .WAV file (no such thing) stores samples as 16-bit values.

• These values are codes representing the voltages (amplitudes) of the signal

• System called pulse code modulation (contrast with pulse amplitude modulation and pulse width modulation)

• WAV format actually supports nearly 100 different coding systems

Page 6: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 6

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Compression

• There’s compression and then there’s compression

• 2 types of audio compression– Compression of dynamic range– Compression of file size

• Studio engineers compress the dynamic range using a compressor. Radio stations also compress:– Shrinks differences in volume– Stops you having to reach for the volume knob

• compression ≠ compression !

Page 7: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 7

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Compression

• We’re going to look at compression rather than compression… ;-)

• That is, shrinking audio files, not squashing their dynamic range (though some shrinking will do this too)

Page 8: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 8

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Compression

• Lossless compression (e.g. LZW) does not work well on audio. Why not? There are very few repeating patterns

• Sampled audio tends to have random noise in the least significant bits making very few bytes identical.

Page 9: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 9

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Lossless compression examples

• Winzip hardly compresses audio files at all– Try girl2.wav and 528 Hz.wav. Why does

the second file compress 2.33:1?

File SizeCompressed

fileCompressed

size

Girl2.wav 594KB Girl2.zip 566KB

528Hz.wav 862KB 528Hz.zip 369KB

Page 10: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 10

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Other techniques

• Need some different compression techniques

• Popular ones are:– Differential PCM (DPCM)– Adaptive DPCM (ADPCM)– A-Law– µ-Law– Logarithmic & non-linear codings– Perceptual codings

Page 11: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 11

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Differential PCM

• Consider the differences in value between individual samples at rates of, say, 44.1 KHz– Usually fairly small– Small differences need fewer bits than the

samples themselves– So, DCPM stores sample differences, hence the

name

• Leads to some inaccuracy and requires look ahead to balance things out

Page 12: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 12

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

DCPM example

• To reduce 8-bit sample values to 4-bit differences

• Consider three samples of 17, 28, 30– Differences: 11, 2– 4-bit system only allows values -8…+7 (1000…0111)– Thus 11 overflows, therefore clipped at 7– But decompressing would then give 17, 24, 26– But if we look at diff. between decompressed sample

and next actual:• 17-28 = 11 -> 7. 17 + 7 = 24. Diff. 24-30 = 6• Give 7, 6 which, when decompressed gives 17, 24, 30

Page 13: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 13

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Predictor based compression

• Try to predict next sample on basis of previous samples

• If correct, no need to store sample as decompressor uses same rules and so can work it out too

• If prediction correct, output 1 else output 0 followed by actual sample

Page 14: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 14

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Adaptive DPCM

• ADPCM uses prediction• Outputs predicted differences. If accurate

then diff between actual and predicted samples has lower variance than actual samples and thus take fewer bits

• Uses 4-bit codes representing predicted diff. between two 16-bit samples

Page 15: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 15

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Sub-band coding

• Low frequencies have fewer cycles per second and thus lots of small differences

• High frequencies have larger differences• Dividing signal into frequency bands allows

low frequencies to be coded with fewer bits than high frequencies

• Bands to which ear is less sensitive can be less accurately stored

Page 16: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 16

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Speech vs music

• What’s a big difference between speech and music?

• How might we use this to our advantage when compressing speech audio?

Page 17: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 17

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Speech compression

• Musical sound has little silence• Speech has many pauses and silences

– These can be replaced by duration codes– Can reduce a signal by 50% by doing this

Page 18: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 18

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Decompressing a stream

• You tune into your favourite internet radio station feed

• Are you joining at the start of the audio stream?

• How would this affect predictive compression/decompression?

Page 19: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 19

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Checkpointing

• Predictive techniques need knowledge of what has gone before

• If a stream (e.g. live radio feed) is opened in the middle, this state information is unavailable

• Therefore, insert checkpoints that contain– Uncompressed samples, or– Compressor state vector

• Checkpoints allow decompressor to reset itself

Page 20: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 20

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Non-linear coding

• High sample rate gives wide dynamic range• Reducing from 16 bits to 8 bits halves

storage requirements, but reduces dynamic range by 63,000 times (96 dB down to 48 dB)

• Standard PCM is linear– Sample value 50 is twice the amplitude of 25– In 8-bit system, sounds less than 1/256th of

loudest possible signal disappears

Page 21: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 21

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Non-linear coding

• Ear is quite insensitive to small changes in loud sounds but very sensitive to same small change in quieter sounds

• Linear coding ideal of computational manipulation but wasteful

• Non-linear coding uses a logarithmic scale– Value of 1 may be much less than 1/50th of

intensity represented by value of 50– More bits for quiet sounds and fewer bits for very

loud sounds

Page 22: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 22

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

-Law & A-Law

-Law and A-Law uses logarithmic compression to convert linear-coded PCM samples into 8-bit codes

• Provide greater accuracy for the small (quiet) samples that form bulk of an audio signal

• Human auditory system has (approx) logarithmic response so these techniques give highest accuracy where most audible

• Dynamic range is 14 bits & 13 bits respec. (84 dB and 78 dB)

Page 23: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 23

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Perceptual coding

• DPCM, ADPCM, -Law & A-Law do not give high-enough compression for demanding multimedia and web applications

• Using psychoacoustic models of our auditory system we can take information out of the audio signal without changing its perceptual characteristics (well, sort of)

• Linear PCM captures sound as it is• Perceptual coding captures audio as it sounds

Page 24: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 24

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Perceptual coding

• PC uses knowledge of the masking properties of the human auditory system and our sensitivity to different frequency bands

• PC introduces significant noise into the signal…

• … but in such a way as we don’t hear it.• MP3, ATRAC (mini disc), DCC use perceptual

coding techniques

Page 25: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 25

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Masking

• Part of an audio signal can be inaudible– A loud sound can mask a simultaneous quiet

sound– A quiet sound immediately following a very loud

sound may also be inaudible

• E.g. you have to turn up the radio when your car goes faster

• E.g. A handclap (normally loud) heard straight after a gun shot would sound quiet

• PC assigns fewer bits to masked signals

Page 26: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 26

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

MPEG audio

• MPEG audio layer 1, 2, & 3• Most commonly use layer 3, hence MP3• A standard for coding an audio stream into a

bit stream at various bit rates• The higher the bit rate, the more data• At a bit rate of 96 kpbs achieve bandwidth

of about 15 KHz and compression of 16:1• At 128 kpbs, get closer to 20 KHz and

compression of about 12:1

Page 27: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 27

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

ATRAC

• Mini disc uses adaptive transform acoustic coding

• Compression of 5:1• Like MP3 uses perceptual coding and sub-

band compression• ATRAC uses three sub-bands, MP3 uses 32

Page 28: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 28

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Streaming

• Streaming is the process of sending an audio file as a continuous stream that can be played back the moment the stream starts

• Avoids having to download the file first– suitable for live situations, e.g. web casts, internet

radio, etc.

• Need to know about network capabilities of client– e.g. no point sending 128 kbps MP3 audio to a 56

k modem client

Page 29: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 29

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Streaming

• Smooth signal heard where transmitter sends data at least as fast as client can decode it– low bandwidth connections and– network congestion

lead to low stream rate = either poorer quality audio, or glitches and pauses

• Popular formats are Real audio, MS ASF, Apple Quicktime

Page 30: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 30

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Creating streamed content

• Very simple• Connect a live feed to a streaming-enable

media producer• Use tools such as Windows Media Encoder

or Real’s Helix Producer to turn audio files into streamable files. Even Sound Forge can save as .ASF and .RM

• Select required bit rate/bandwidth• Some services provide multiple bit rates

Page 31: School of Informatics, Engineering, & Technology CM613 Multimedia Storage & Retrieval Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving,

Compression & Streaming Dr Paul Vickers 31

CM613 Multimedia Storage & RetrievalSchool of Informatics, Engineering, & Technology

Example

http://computing.unn.ac.uk/staff/cgpv1/music!.htm