an introduction to video compression...video interface • converts the camera’s pixel array data...

An In

trodu

ctio

n to

Vid

eo

Com

pres

sion

An Introduction to Video Compression

2

AGENDA

• Video Basics– Analog Video– Digital Video– Scanning Formats

• Video Compression– Intra-Frame Coding– Inter-Frame Coding

• Video Quality• Video Coding Standards• Audio Compression

3

Video Basics

Typical Camera

4

Image Sensor

Image Processor

Video Interface

Image Sensors (CCD/CMOS)

• Each cell is a monochrome detector

• A filter array is used to make them sensitive to different color frequencies

• More green cells are used than red and blue cells because human vision is more sensitive to green frequencies

• This is called a Bayer pattern

5

Image Processing

• Image processing internal to the camera converts the Bayer pattern to the desired camera output format

• Demosaicing and Interpolating algorithms are used to generate the output pixels

• Typical outputs include RGB and YCrCb

6

Video Interface

• Converts the camera’s pixel array data to an output signal

• Horizontal/Vertical scanning of image array• Parallel to Serial conversion of pixel data• Synchronization, timing and status insertion• Electrical signal generation

7

88

Analog Composite Video Waveform

99

Analog Video Image

10

Analog Composite Video Formats• National Television Standard Committee (NTSC)

– Interlaced: 2 fields/frame– 525 lines/frame, 262.5 lines/field– 29.97 frames/second, 59.94 fields/second– 63.5 uSeconds/line– Color burst: 3.58 MHz

• Phase Alternate Line (PAL)– Interlaced: 2 fields/frame– 625 lines/frame, 312.5 lines/field– 25 frames/second, 50 fields/second– 64 uSeconds/line– Color burst: 4.43 MHz

• Séquentiel couleur à mémoire (SECAM)

1111

Digitized Composite Video Formats• NTSC

– 720 horizontal x 480 vertical pixels– 2:1 Interlaced– 29.97 Frames per Second (59.94 Fields per Second)– 165.888 M Bits per Second

• PAL– 720 horizontal x 576 vertical pixels– 2:1 Interlaced– 25 Frames per Second (50 Fields per Second)– 165.888 M Bits per Second

1212

Other Analog Formats• S-Video

– Y – Luminance Signal– C – Chrominance Signal– Eliminates need for color separation filtering– Improved quality over Composite Video

• YPrPb Component– Y – Luminance Signal– Pr – Red-Green Color Difference Signal– PB – Blue-Green Color Difference Signal– Supports High Definition Resolutions

• RGB Component– Red, Green, and Blue Signals– Separate Sync signal– Sync on Green– Primarily a computer monitor format

Digital Broadcast Interfaces

• Serial Digital Interface– SMPTE 259 SDI (270Mbps)

• 480i30 (720x480)• 576i25 (720x576)

– SMPTE 292 HD-SDI (1.485Gbps, 1.485/1.001Gbps)• 720p60 (1080x720)• 1080i30 (1920x1080)• 1080p30 (1920x1080)

– SMPTE 424 3G-SDI (2.970Gbps, 2.970/1.001Gbps)• 1080p60 (1920x1080)

13

Serial Digital Interface Line

14

Serial Digital Interface Frame

15

New Digital Broadcast Interfaces

• New Serial Digital Interfaces– SMPTE 2081 6G-SDI (5.94Gbps)

• 4Kp30 (3840x2160)– SMPTE 2082 12G-SDI (11.88Gbps)

• 4Kp60 (3840x2160)– SMPTE 2083 24G-SDI (23.76Gbps)

• 8Kp30 (7680x4320)

16

Broadcast Interface• Fixed resolutions and

frame rates• Unidirectional interface• No control channel• Fire and forget• Format indicated within

the signal• Coax cable• No trigger capability• No camera power

Machine Vision Interface• Random resolutions and

frame rates• Bidirectional interface• Control channel• Often requires handshake• Format configured in

control channel• Most use complex cables• Triggers• Camera power

17

Machine Vision Interfaces

• GigE Vision– Uncompressed video over UDP on 1-Gigabit and 10-Gigabit

Ethernet networks– Cat5 cabling, Ethernet infrastructure– 125MB/s (1250MB/s) transfer rate

• USB3 Vision– Same API as GigE Vision over USB 3.0 connections

18

Machine Vision Interfaces (Cont)

• CoaXPress– 6.25 Gbps over coax cable– Group multiple cables (4 cables = 25 Gbps)– Future expansion to 10 and 12.5 Gbps

• CameraLink– 26-pin Mini-D ribbon connector– LVDS signal pairs, 2.04 Gbps– Group multiple cables (2 cables = 4.08 Gbps)– Video data, discrete signals, control channel

19

Machine Vision Interfaces (Cont)

• IEEE-1394 FireWire– Tree bus topology, bus master negotiations

and self identification– 400 and 800 Mbps (1600 and 3200 Mbps

defined but not widely supported).– 4, 6 and 9 conductor connectors

• DVI/HDMI– 19 pin connector– 48 Gbps data rate– I2C Control channel– Copy Protection negotiation

20

2121

Video Resolutions

22

Interlaced Scanning Format1

264

2

265

3

524

262

525

263 Even fieldscan line

Odd fieldscan line

Retrace

23

Progressive Scanning Format1

2

3

4

5

522

523

524

525

Scan line

Retrace

Interlace vs Progressive

24

Interlace vs Progressive

25

Color Spaces

• Monochrome• Red, Green, Blue (RGB)

– Separate sync, Sync on Green• YIQ (NTSC, PAL)• YUV• YPbPr (Analog), YCbCr (Digital)

– Pb/Cb = Blue - Luma– Pb/Cr = Red - Luma

• HSV, HSB, HSL• CMYK

26

Other Characteristics

• Pixel Bit Depth– 8, 10, 12, 14, 16 bits per component

• Chroma Sampling– 4:4:4, 4:2:2, 4:2:0, 4:0:0

27

2828

Video Compression

2929

The need for Video Compression• Digital Transmission.

– Need to match video data rate to digital communications system bandwidth.

• Digital Storage.– Need to match video data rate to digital storage

system bandwidth.– Need to reduce storage capacity or increase storage

time.• Multiplexing

– Send more programs or other data over the same channel.

Video Data Rates

• Uncompressed SD Video– 720x480, 30 fps, 16 bpp

• 166 Mbps

• Uncompressed HD Video– 1920x1080, 60 fps, 16 bpp

• 1,990 Mbps

• Uncompressed UHD Video– 7680x4320, 120 fps, 16 bpp

• 63,701 Mbps

30

3131

Digital Transmission Rates• Transmission System Data Rates

– Ethernet: 10/100 Mbps, 1/10 Gbps– OC12: 622 Mbps– OC3: 155 Mbps– DS3: 45 Mbps– T1: 1.544 Mbps– DSO: 64 Kbps– Modem: 33.6 Kbps– Cellular: 9600

3232

Digital Storage Capacity• Uncompressed SD Video: 166 Mbps

– 1 Min. Clip = 1.24 G Bytes– 30 Min. TV Show = 37.3 G Bytes– 2 Hour Movie = 149.3 G Bytes

• Uncompressed HD Video: 1,990 Mbps– 1 Min. Clip = 14.9 G Bytes– 30 Min. TV Show = 447.8 G Bytes– 2 Hour Movie = 1.79 T Bytes

• Uncompressed UHD Video: 63,701 Mbps– 1 Min. Clip = 477.8 G Bytes– 30 Min. TV Show = 14.3 T Bytes– 2 Hour Movie = 57.3 T Bytes

33

Digital Storage Devices

• CD– 185 to 870 M Bytes (various formats)

• DVD– 1.46 to 17.08 G Bytes (SS, SD, SL, DL)

• BluRay Disk– 25 G Bytes (50 GB for dual layer disks)

• Disk Drives– > 1 T Byte

3434

Types of Compression• Lossless

– Output image is numerically identical to the original image on a pixel-by-pixel basis.

– Only statistical redundancy is reduced– Compression ratio is usually low – 2:1 to 4:1– Reversible (infinite compress/decompress cycles)

• Lossy– Output image is numerically degraded relative to the

original.– Statistical and perceptual redundancy is reduced– High compression due to reduction of perceptual

redundancy– Can be visually lossless– Irreversible (compress/decompress degrades images)

3535

Video Compression Standards

• Functions Standardized– Bit stream syntax– How the decoder interprets the bit

stream

• Functions Not Standardized– Pre-processing

• Filtering, scaling, noise reduction– Encoding strategy

• Mode Selection• Quantizer Selection• Block Pattern Selection• Motion Estimation

– Post-filter• Scaling, block filtering, error

concealment

PRE-PROCESS- Noise Filter

ENCODE

POST-PROCESS

- SignalEnhancement

- Filter- Error

Concealment

DECODEBIT-

STREAM- Scaling

36

Preprocessing (1/5)

• Noise Reduction– Filter out high frequency information and

improves compression efficiency– Spatial Noise Reduction

• Reduces high frequency information within a picture

– Temporal Noise Reduction• Reduces high frequency information between

frames

Preprocessing (2/5)

• Resolution Reduction– Reduces the amount of

spatial information that needs to be compressed

– Scaling• Maintains field of view

37

Preprocessing (3/5)

• Resolution Reduction– Reduces the amount of

spatial information that needs to be compressed

– Cropping• Maintains pixel

resolution38

Preprocessing (4/5)

• Frame Rate Reduction– Reduces the amount of temporal information

that needs to be compressed

3939

1 2 3 4 5 6

Video Sequence – 30 Frames per Second

1 3 5X X X

Video Sequence – 15 Frames per Second

16 Bits/Pixel

Y Cr Cb

4:4:4 4:2:2 4:2:0

24 Bits/Pixel

12 Bits/Pixel

• Color representation– Reduces the amount of spatial information

that needs to be compressed

Preprocessing (5/5)

4040

4141

Intra-Frame Coding

• Removes the redundancies within a frame• Each frame is encoded separately without

regard to adjacent frames• Similar to JPEG

Encoding Stages

• Signal Analysis– Transform from spatial domain to another

domain that is easier to compress• Quantization

– Discard less important information• Variable Length Coding

– Last stage of efficient coding42

4343

Video Compression SystemSignal

Analysis

INTRAFRAME- Predictive+ 1D, 2D predictor+ Fixed, adaptive

- Block transform+ DCT+ Karhunen Loeve+ Hadamard+ Fourier+ Haar

- Sub-band filtering+ Unconstrained

rectangular blocks+ QMF, other filters+ Wavelets+ Pyramidal

- Block patternmatching (VQ)

- Fractals

INTERFRAME- Application of

intraframe techniques- Motion estimation- 3-D transform

Quantization

- Variable precision

- Visibility matrix

- Fixed, adaptive

- Lossy, lossless

- Scalar, vector

Variable length coding(entropy coding)

- Fixed+ Huffman+ B+ Comma+ 2 dimensional

(run length/COEFF)

- Adaptive+ Conditional+ Arithmetic+ One, two passes

- Lossless

- Predictionerror

- Transformcoefficients

- Filter energylevels

COMPRESSEDOUTPUT

Uncompressedinpute.g. 8 bits/pixel

4444

Signal Analysis Phase

SignalAnalysis

INTRAFRAME- Predictive

+ 1D, 2D predictor+ Fixed, adaptive





- Fractals



Quantization


- Visibility matrix

- Fixed, adaptive

- Lossy, lossless

- Scalar, vector



(run length/COEFF)


- Lossless

- Predictionerror



COMPRESSEDOUTPUT


4545

Discrete Cosine Transform• Converts spatial information to frequency

information• Important information is concentrated in

lower frequency bands• Operations typically performed on 8x8

blocks of pixels• Lossless and reversible

4646

Transform Coding by DiscreteCosine Transform (DCT)

DCT

Segmentation of a frameinto blocks of pixels

8x8 Pixels8x8

Coefficients

4747

DCT Basis Function

4848

Effect of DCT Coefficients

4949

More Effects

5050

Coefficient Levels

5151

Quantization Phase

SignalAnalysis

INTRAFRAME- Predictive+ 1D, 2D predictor+ Fixed, adaptive





- Fractals



Quantization


- Visibility matrix

- Fixed, adaptive

- Lossy, lossless

- Scalar, vector



(run length/COEFF)


- Lossless

- Predictionerror



COMPRESSEDOUTPUT


5252

Quantization

• Converts 12 bit DCT coefficients to 8 bit values

• Try to force less important information toward zero values

• Takes less bits to code zero• Most of loss occurs here

5353

Scalar Quantization

4

3

2

256 Input0

1

Output

-1

-2

-3

-4

512 768 1024

-1024 -768 -512 -256

803

Quantize 803 >> 3Inv Quant 3 >> 768Quantization Error 803 – 768 = 35

5454

Scanning Order in a Block

1 2 6 7 15 16 28 293 5 8 14 17 27 30 434 9 13 18 26 31 42 4410 12 19 25 32 41 45 5411 20 24 33 40 46 53 5521 23 34 39 47 52 56 6122 35 38 48 51 57 60 6236 37 49 50 58 59 63 64

5555

Entropy Coding Phase

SignalAnalysis

INTRAFRAME- Predictive

+ 1D, 2D predictor+ Fixed, adaptive





- Fractals



Quantization


- Visibility matrix

- Fixed, adaptive

- Lossy, lossless

- Scalar, vector



(run length/COEFF)


- Lossless

- Predictionerror



COMPRESSEDOUTPUT


5656

Variable Length Coding• A technique for assigning shorter codes to

high probability symbols, and longer codes to lower probability symbols

• Entropy Coding, Huffman Encoding, Arithmetic Coding

• Modest compression gain (2:1-4:1)• Lossless• Reversible

57

VLC Examples

5858

Huffman Variable Length Coding

Symbol Probability VLC Code Fixed Length Code

A ½ 0 00

B ¼ 10 01

C 1/8 110 10

D 1/8 111 11

For 128 symbols: Fixed Length = 256 bits

Variable Length = 224 bits (12.5% savings)

59

Sample Intra Block Coding

688 -21 0 0 0 0 0 0-39 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

e) INVERSE QUANTIZED COEFFICIENTS

76 76 77 79 80 81 82 8377 77 78 80 81 82 83 8479 79 80 81 83 84 85 8681 82 83 84 85 87 88 8884 84 85 87 88 89 90 9186 87 88 89 91 92 93 9388 89 90 91 92 94 95 9589 90 91 92 93 95 96 96

f) RECONSTITUTED BLOCK

684 -19 -1 -2 0 -1 0 -1-37 0 -1 0 0 0 0 -10 0 0 0 0 0 0 0-4 -1 -1 -1 -1 0 -1 -10 0 0 0 0 0 0 0-2 0 0 -1 0 -1 0 -10 0 0 0 -1 -1 -1 -1-1 -1 -1 0 -1 0 -1 0

b) TRANSFORM COEFFICIENTS

RUN LEVEL CODE86 01010110-3 001011-6 001000011

EOB 10TOTAL CODE LENGTH = 25

86 -3 0 0 0 0 0 0-6 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

c) QUANTIZED COEFFICIENTS d) VARIABLE LENGTH CODING

7775 76

78

a) ORIGINAL PIXELS

77 78 79 80 81 8279 80 81 82 83 84

79 80 81 82 83 84 85 8681 82 83 84 85 86 87 8883 84 85 86 87 88 89 9085 86 87 88 89 90 91 9287 88 89 90 91 92 93 9489 90 91 92 93 94 95 96DCT

Quantize

InverseDCT

InverseQuantize

Variable Length Code

60

Inter-Frame Coding• Removes the redundancies between

frames• Uses one or more previous or future

frames as a prediction of the current frame• Produces a difference between the current

frame and the prediction• Only encode the differences

Encoding Stages

• Signal Analysis– Transform from spatial domain to another domain that is easier

to compress• Quantization

– Discard less important information• Variable Length Coding

– Last stage of efficient coding• Prediction

– Create a prediction for the frame being encoded and only compress the difference

61

6262

Generic Inter-Frame Coding System

Predictor

DCT and Quantizer

Inverse DCTand Quantizer

+

-Prediction

Error

PredictedSignal

InputVideo

OutputData

63

Original Image Histograms

64

Difference Image

6565

Difference Image Histograms

6666

Pictures Types

• Intra Frame (I)– Full picture coded independent of other frames– Key frame

• Predicted Frame (P)– Inter coded frame predicted from a previous frame

• Bidirectional Frame (B)– Inter coded frame predicted from a future frame

• GOP– # of P/B frames between I frames– # of B frames between I/P frames– Open or Closed GOP

6767

Group of Pictures (GOP)

Transmission Order: IPBBPBB

6868

Picture Type Efficiency

• I Frames: 1 bit per pixel• P Frames: 0.25 bits per pixel• B Frames: 0.03 bits per pixel• All I Frames: 10Mbps• IP15: 6 Mbps• IPB15: 2.5 Mbps

6969

Improving Prediction

• Objects in a scene do not usually change instantaneously– Objects move– Camera pans and zooms

• Improve prediction by accounting for the motion of blocks

• Extremely computation intensive

Improved Prediction with Motion Vectors

70

7171

Motion Vector Example

7272

Block Sizes• H.261/H.263 use 16x16 block for motion

estimation• H.264 uses variable block sizes

0

Sub-macroblock partitions

0

1 0 1

0 1

2 3

0 0

1 0 1

0

2

1

3

1 macroblock partition of 16*16 luma samples and

associated chroma samples

Macroblock partitions

2 macroblock partitions of 16*8 luma samples and

associated chroma samples 4 sub-macroblocks of 8*8 luma samples and

associated chroma samples 2 macroblock partitions of

8*16 luma samples and associated chroma samples

1 sub-macroblock partition of 8*8 luma samples and

associated chroma samples 2 sub-macroblock partitions

of 8*4 luma samples and associated chroma samples

4 sub-macroblock partitions of 4*4 luma samples and

associated chroma samples 2 sub-macroblock partitions of 4*8 luma samples and

associated chroma samples

7373

Block Size Example

7474

Intra-Frame Prediction• 4x4 intra prediction modes

• Also 16x16 intra prediction modes

7575

Performance

76

Video Quality

77

Rate Control Algorithm (1/2)

• This is the encoder magic sauce• Determines how to code each macro-block• Rate Control Modes

– Constant Quality (CQ) – Attempt to product a consistent video quality regardless of the data rate required

– Constant Bit Rate (CBR) – Attempt to product the best quality video while holding the data rate constant. May use fill data to maintain the constant data rate. May still be some minor variability in data rate.

– Variable Bit Rate (VBR) – Attempt to produce the best quality video while not exceeding some maximum data rate, but allowing the data rate to drop when not needed

Rate Control Algorithm (2/2)

• For VBR and CBR, encoder attempts to use the finest quantizer (sharpest quality pictures)

• If the encoder is generating too much data for the selected data rate, the encoder increases the coarseness of the quantizer up to the maximum user selected value (picture quality degrades)

• If the encoder is still generating too much data for the selected data rate, the encoder begins dropping frames (reduced frame rate)

78

79

Encoder Trade-offs

• Multidimensional trade-off space– Data rate– Resolution– Frame Rate– Quality– Latency– Error resilience

80

Compression Ratio

• Old compression algorithms had a fixed bits per pixel and therefore had a defined Compression Ratio.

• New compression algorithms have a variable number of bits per pixel depending on a number of factors, most notable is picture quality.– 1000:1 compression is very possible, but picture quality will be

unusable– 100:1 is reasonable

• Compression efficiency and picture quality are also highly dependent on scene complexity.

8181

Typical Video Quality Problems

• Data rate too low to support desired quality– Communications link bandwidth– Codec efficiency– Resolution and frame rate

• Too many errors– Bit errors– Packet loss– Burst errors

82

Video Quality Effects• Video Frame Rate

– Reduced Frame Rate– Jerky Video (irregular frame rate)

• Video Artifacts– Multi-color blocks– Black or white stripes– Tearing or melting images– Ghosts and shadows behind moving objects– Block edges– Edge noise around sharp edges

• Decoder Issues– Loss of sync, cannot detect start codes– Video freeze– Decoder crash

8383

Compression Artifacts - Blockiness

8484

Compression Artifacts - Blockiness

8585

Compression Artifacts - Blurriness

8686

Compression Artifacts - Edge Noise

8787

Compression Artifacts - Edge Noise

8888

Error Artifacts - I-Frame Data

8989

Error Artifacts – P/B-Frame Data

9090

Error Artifacts – P/B Frame Data

9191

Packet Loss Artifacts

1% Packet Loss Rate 5% Packet Loss Rate

92

Compressed Data Error Sensitivity

• Sparse Start Codes – usually at the beginning of a picture

• Variable length codes – a bit error not only changes the value, but also the length of the code and the following values

• Inter-frame coding propagates an error over the rest of the GOP

93

Error Resilience Features

• GOP Structure – distance between I-Frames• Intra Refresh – Intra-coded macroblocks

distributed throughout long GOPs• Slices – partitioning image into sub images with

re-sync points• Flexible Macroblock Ordering, Arbitrary Slice

Ordering, Redundant Slices• Forward Error Correction Schemes

Error Concealment Features

• Decoder functions to reduce the effect of errors• Intra-frame concealment – use other pixels

within the frame to predict the value of the missing pixels

• Inter-frame concealment – use pixels from previous frames to predict the value of the missing pixels

• Estimation of missing motion vectors94

Error Resilience Schemes

• Flexible Macroblock Ordering (FBO)• Arbitrary Slice Ordering (ASO)• Data Partitioning (DP)• Redundant Slices (RS)• SP/SI Frames for bit rate switching• Reference Frame Selection• Intra-block refreshing95

Video Encoder Settings

• Video Resolution– Autodetect– Scaling/Cropping

• Frame Rate Decimation• GOP Size/Structure

– Infinite GOP=0 (only P-Frames)– GOP=1(only I-Frames)– Normal GOP=2 to N (I and P-frames)

• Quantizer Settings

96

Video Encoder Settings

• Slice Modes– Error Recovery– Full Frame– Macroblocks– Bytes

• Intra-Refresh– Force Intra Macroblocks during P-Frames– Sequential– Random

97

Video Encoder Setting

• Date Rate• Rate Control Mode

– Constant Quality– Constant Bit Rate– Variable Bit Rate

• Rate Control Parameters– Not common

98

999999

100100100

Contact Information

747 Dresher RoadHORSHAM, PA 19044

Ph: 215-657-5270FAX: 215-657-5273

www.deltadigitalvideo.com

100

Backup Slides

101

102102

High Definition Formats• 720p

– 1280 horizontal x 720 vertical pixels– Progressive– 60 Frames per Second– 884.736 M Bits per Second

• 1080i– 1920 horizontal x 1080 vertical pixels– 2:1 Interlaced– 30 Frames per Second (60 Fields per Second)– 995.328 M Bits per Second

• 1080p– 1920 horizontal x 1080 vertical pixels– Progressive– 60 Frames per Second– 1990.656 M Bits per Second

103103

Interlace Artifacts

Video Compression Standards

Standards Organizations

• In the beginning there was COST 211. They created the H.120 standard for interactive video communications.

• The ITU-T VCEG (Video Coding Expert Group) was formed to improve upon H.120.

• MPEG (Moving Pictures Experts Group) was formed to find a way to incorporate codecs for broadcast work.

• To this day, the two committees – MPEG and VCEG, work side-by-side. MPEG specializes in broadcast (television), while the ITU (International Telecommunications Union) focuses on telecommunications (phone, internet).

2

ITU-T H.120

• Adopted in 1984• 1544 kbit/s for NTSC and 2048 kbit/s for PAL• Used differential pulse-code modulation, scalar

quantization, variable-length coding • Showed that a better algorithm was needed to produce

usable quality at reasonable data rates

3

444

ITU-T H.261 Overview

• Adopted 1988• Motion Video Algorithm typically used for Video

Conferencing over ISDN networks• Target Data Rate

– 56 Kbps to 2048 Kbps• Available Resolutions

– CIF: 352x288– QCIF: 176x144

55

ISO/IEC JPEG Overview

• Adopted in 1993• ITU-T T.81, ISO/IEC 10918-1• Still Image Compression Algorithm• Based on Discrete Cosine Transform (DCT)• Motion video mode – sequence of still images• Lossless and lossy compression modes

66

ISO/IEC JPEG-2000 Overview

• Adopted in 2000• ITU-T T.800, ISO/IEC 15444-1• Still Image Compression Algorithm• Based on Discrete Wavelet Transform (DWT)• Motion video mode – sequence of still images• Lossless and lossy compression modes

77

JPEG vs JPEG2000

88

JPEG vs JPEG-2000(at same bit rate)

JPEG JPEG-2000

ISO/IEC MPEG-1

• Adopted in 1992• ISO/IEC 11172 • Motion Video Compression primarily for CD-ROM video

storage• Target Data Rate: 1.5 Mbps• Most common resolution was the Source Input Format

(SIF) – 352x240 (for NTSC)– 352x288 (for PAL)– 320x240 (for VHS)

9

ITU-T H.262 / ISO/IEC MPEG-2

• Adopted in 1996• ITU-T H.262, ISO/IEC 13818• Typical Data Rates

– 3-5 Mbps for broadcast quality standard definition video– 15-20 Mbps for high definition video

• Typical Resolutions– SIF: 360x240p– HHR: 360x480i– SD: 720x480i, 720x576i– ED: 720x480p– HD: 1280x720p, 1920x1080i

• Enabled digital broadcast, cable and satellite television10

111111

ITU-T H.263 Overview

• Adopted in 1999• ITU-T Recommendation H.263• Target Data Rate: 56 kbps• Target Resolution: Sub QCIF 128x96• Designed for video phone over analog telephone lines• Significant improvements in compression efficiency

121212

Algorithm Comparison

131313

ISO/IEC MPEG-4 Overview

• Adopted in 1999• ISO/IEC 14496-2 MPEG-4 Part 2• Based on H.263 as baseline• Optimized for internet streaming

applications• Same picture sizes as MPEG-2 but also

smaller sub-QCIF sizes

ITU-T H.264

• Adopted in 2003• ITU-T H.264• Also called ISO/IEC 14496-10 – MPEG-4 Part 10, Advanced Video

Coding• H.264 is the most widely used codec on earth, even surpassing

broadcast MPEG-2, simply due to the power of the internet. It is used by Youtube, and every other video provider of note.

• It is also the codec that drives Blu-ray. Most cable, satellite and television channels have moved to H.264.

14

MPEG-2 vs H.264

15

1616


MPEG-2 MPEG-4 H.264

ITU-T H.265

• Initially adopted 2013 and revised several times.• ISO/IEC 23008–2 MPEG-H Part 2, High Efficiency Video Coding

(HEVC). • Up to 8K UHDTV (8192×4320 maximum) • Up to 12-bit color bit depth 4:4:4 and 4:2:2 chroma sub-sampling • Supports up to 300 fps • Data rates up to several GB/s • Special features to support high efficiency encoding of very high

resolution video

17


18

ITU/ISO/IEC New Work

• Formed Joint Video Exploration Team (JVET) to investigate next generation video compression standard

• Target release in 2020

19

Other Codecs

• Microsoft– WMV7, WMV8, WMV9

• Google– VC6, VP7, VP8, VP9

• DIRAC• Alliance of Open Media (AOMedia)

– AV120

212121

222222

Contact Information


Ph: 215-657-5270FAX: 215-657-5273


22

232323

Backup Slides

Overview of IRIG-210

The Horace Standard

252525

IRIG 210

• Developed in late 1980’s under SBIR program• Standardized by IRIG in 1993• Differential Pulse Code Modulation Algorithm (DPCM)• Monochrome, RS-170 video• Full, Fixed or variable frame rate

– Field or frame skip modes• Typical Resolution

– 640, 512, 450, 320, 256, 225, 160 or 128 pixels per line– 240 lines per field or 480 lines per frame

262626

Differential PCM (DPCM)• Video is encoded line by line

– Line Synchronization code– Format Information– Line pixel data

• PCM – 7 bit value for each pixel• DPCM – difference between the current pixel and the previous pixel

– Three DPCM modes: Normal, High, 2-Bit– Adaptive on a line by line basis

• Entropy Coding– Normal and High modes only– 1 to 8 bit codes based on previous and current DPCM code

272727

DPCM Modes

• Normal– 3 bits per pixel (8 difference values)

• 0, +3, -3, +8, -8, +20, -20, +/-40

• High Level (course)– 3 bits per pixel (8 difference values)

• 0, +4, -4, +10, -10, +25, -25, +/-50

• 2-Bit Mode– 2 bits per pixel (4 difference values)

• +4, -4, +26, -26

282828

Entropy Coding

292929

DPCM Example 1Input Pixel 50 45 48 43 40 44 47 51

Difference +50 -5 +3 -5 -3 +4 +3 +4

DPCM Code 8 2 2 3 3 2 2 2

DPCM Jump +40 +3 +3 -3 -3 +3 +3 +3

Output Pixel 40 43 46 43 40 43 46 49

Error 10 2 2 0 0 1 1 2

Output Bits 00000001 001 01 001 010001 01 01

26 Bits for 8 Pixels = 3.25 Bits/Pixel

303030

DPCM Example 2

Input Pixel 2 4 8 16 32 64 127 127

Difference +2 +2 +4 +8 +16 +32 +64 +0

DPCM Code 2 2 2 4 6 6 8 6

DPCM Jump +3 +3 +3 +8 +20 +20 +40 +20

Output Pixel 3 6 9 17 37 57 97 117

Error 1 2 1 1 5 7 30 10

Output Bits 001 01 01 0001 0001 1 000000010000001


313131

DPCM Example 3

Input Pixel 0 0 0 0 127 127 127 127

Difference +0 +0 +0 +0 +127 +0 +0 +0

DPCM Code 1 1 1 1 8 8 8 4

DPCM Jump +0 +0 +0 +0 +40 +40 +40 +8

Output Pixel 0 0 0 0 40 80 120 128

Error 0 0 0 0 87 47 7 1

Output Bits 0001 1 1 100000001

00000001

00000001

00001


323232

IRIG 210 Performance

• Best possible performance– Black screen– 1-bit per pixel– 512x480 @ 30 fps: 7.7Mbps– 256x480 @ 30 fps: 4.0 Mbps

• Typical performance– About 2.2-bits per pixel– 512x480 @ 30 fps: 16.5 Mbps– 256x480 @ 30 fps: 8.4 Mbps

Overview of H.261 and H.263

The original video conferencing codec

343434

ITU-T H.261 Format

• Picture Structure– Picture [CIF: 2x6 GOBs, QCIF: 1x3 GOBs]

• Group of Blocks (GOB) [11x3 macro blocks]– MacroBlock [ 4 Y, 1 Cr, 1 Cb blocks]

» Block [ 8x8 pixels]

• Forward Error Correction– BCH (511,493)– 512 bit frame with sync bit and fill indicator

353535

ITU-T H.261 Intra-frame Coding

• Full Intra-frame coded on startup• After that all frames Inter-frame coded• Intra Refresh

– Intra code individual blocks– X macroblocks per frame– To correct transmission and roundoff errors

• Full Intra-frame coded or on command– Usually requested by decoder

363636

ITU-T H.261 Inter-frame Coding

• Prediction based on previous frame with motion compensation

• Motion vector applies to macroblock• Full pixel motion vector resolution• +/- 15 pixel motion vector range

JPEG and JPEG2000

• JPEG (Joint Photographic Experts Group) became a popular codec of choice for still image compression

• Intra Frame compression algorithms – compress each frame independently without regard to prior or later frames

37

3838

ISO/IEC JPEG-2000 Features

• Superior compression performance compares to JPEG.• Multiple resolution representation.• Progressive transmission by quality and resolution

accuracy.• Random code-stream access and processing, also

referred as Region Of Interest (ROI).• Error resilience.• No blocking artifacts• Supports very large format images and tiling

Overview of MPEG-1, 2, 4

Entertainment quality video

404040

MPEG-1 Overview

• ISO/IEC 11172 MPEG-1 Standard (1992)• Target bit-rate about 1.5 Mbit/s• Typical image format SIF (352x240), no

interlace• Frame rate 24 ... 30 fps• Main application: video storage for

multimedia (e.g., on CD-ROM)

414141

MPEG Components

424242

Discrete Cosine Transform

• 4 x 4 Block size– DCT requires O(n^2) operations– 8x8: 4096 operations– 4x4: 256 operations (x4=1024)

• 2 x 2 block size for Chrominance• 4 x 4 block size for DC coefficients of 16x16 group• Integer implementation

– 16 bit integer math– No floating point math

• No multiplication or division– Only add, subtract, shift

• Integrated with quantizer– Scaling integrated with quantizer so it is only done once.

434343

More Motion Estimation

• Forward and Backward Reference Frames (Blocks)

• Multiple Reference Frames (Blocks)– Average of blocks in multiple different frames– Weighted factor for each reference frame

• 1/4 pixel accuracy/resolution

444444

Improved De-blocking Filter

• In-loop filtering, not post filtering• Improves the quality of the predicted image as

well as the resultant image• On edge level, filtering strength is dependent on

the code mode, motion vector, and values of residuals

• On sample level, quantizer dependent thresholds can turn off filtering for any individual sample

454545

Entropy Coding

• Exponential Golomb code– Header and other information

• Context-Adaptive Variable Length Coding (CAVLC)– Quantized transform coefficients– Run-length coding– Runs of zeros, runs of ones

• Context-Adaptive Binary Arithmetic Coding (CABAC)– Quantized transform coefficients– 5-15% improvement over CAVLC, but more complex

464646

Exponential Golomb Codes

Bit string form Range of Symbols (#)1 0 (1)

0 1 x0 1-2 (2)0 0 1 x1 x0 3-6 (4)

0 0 0 1 x2 x1 x0 7-14 (8)0 0 0 0 1 x3 x2 x1 x0 15-30 (16)

0 0 0 0 0 1 x4 x3 x2 x1 x0 31-62 (32)… …

474747

Slices

• Groups of one or more Macro Blocks (MB)– Horizontal, vertical, rectangular, dispersive– Independently decodable– Limits error propagation within a picture

• I Slice– Only intra coded MB

• P Slice– Forward predicted and intra coded MB

• B Slice– Backward, forward, multiple reference and intra coded MBs

• SI, SP Slice– For bitstream switching

• Redundant Slices– For error resilience

• Arbitrary Slice Order

Parts and Layers

• MPEG standards have sub-divisions, called Parts. • Traditionally, Part 1 is always for the ‘System’ (file or

stream format). Part 2 is for video, and Part 3 is for audio.

• Parts are further sub-divided into Layers. • Audio has three or more layers, called Layer I, Layer II

and Layer III and so on.• MPEG-1 Part 3 Layer III is called MP3.

48

Profiles and Levels

• Beginning with MPEG-2, is the concept of Profiles and Levels – in addition to Parts and Layers.

• Profiles are sets of capabilities within the specification to be used for specific applications.

• Levels specify a range of parameters used within a profile.

• The algorithms are so complex and address so many different applications that it does not make sense for an encoder or software to comply with all of it.

49

505050

ISO/IEC MPEG-2 Overview

• Part 1, Systems defined two major classifications: Program Stream and Transport Stream

• Part 2, Video is also called H.262. It had additional support for interlacing and 4:2:2

• Part 3, Audio incorporated 5.1 channels

515151

ISO/IEC MPEG-2 Features

• More input flexibility– Interlaced and progressive pictures– Field and frame coding– Wide range of picture resolutions

• Motion estimation– Half pixel motion vectors– 16x16 or 16x8 motion blocks– Estimation across fields (Dual prime)

• Non-linear quantization• Improved VLC • Various scalability tools

525252

Frame/Field Coding

535353

ISO/IEC MPEG-2 Profiles

• Simple– 4:2:0 sampling, I/P pictures only, no scalable coding

• Main– As above, plus B pictures

• SNR– As above, plus SNR scalability

• Spatial– As above, plus spatial scalablity

• High– As above, plus 4:2:2 sampling, 11 bit precision

• 422– As above, no scalability, higher data rate

545454

ISO/IEC MPEG-2 Levels

• Low– 352 x 288 luminance samples, 30 Hz

• Main– 720 x 576 luminance samples, 30 Hz

• High-1440– 1440 x 1152 luminance samples, 60 Hz

• High– 1920 x 1152 luminance samples, 60 Hz

555555

ISO/IEC MPEG-4 Features

• Adds object based coding• Natural and synthetic objects• Shape coding

– Identifies objects and background– Codes them separately

• Motion Coding– Accounts for object motion– 1/4 Pixel motion vectors, 8x8 motion blocks– Global motion vectors

• Texture Coding– Accounts for object color and texture

565656

ISO/IEC MPEG-4 Profiles and Levels

• Profiles– Version 1: Simple, Simple Scalable, Core, Main, N-bit, Scaleable

Texture, Simple Face Animation, Basic Animated Texture, and Hybrid.

– Version 2: Core Scalable, Advanced Core, Advanced Coding Efficiency, Advanced Real Time Simple, Advanced Scaleable Texture, and Simple FBA.

– 1st Extension: Simple Studio and Core Studio.– 2nd Extension: Advanced Simple and Fine Granularity Scalability.

• Levels– Various levels constraining profile parameters

575757

Network Abstraction Layer (NAL)

• Formats the compressed data and provides header information in a manner appropriate for conveyance on a variety of communication channels or storage media

• NAL Unit Stream– Groups data by slice– Used for packet transmission or storage applications

• Byte Stream– NAL Unit plus Start Code Prefix and fill Suffix– Used for transmission applications

585858

H.264 Levels

• Restricts range of parameter values• Bit rate• Frame size• Motion vector range• Slice size

595959

H.264 Summary

• 50% coding gain over MPEG-2 and H.263 Baseline profile

• 25% coding gain over H.263 CHC profile• Encoder complexity 3x that of older

algorithms• Decoder complexity 2x that of older

algorithms

ITU-T H.264 Features

• 40-50% the bit rate reduction at the same visual quality compared to MPEG-2

• Hybrid spatial-temporal prediction model• Flexible partition of Macro Block (MB) , sub MB for motion

estimation• Intra Prediction (extrapolate already decoded neighboring pixels for

prediction) with 9 directional modes • Entropy coding is CABAC and CAVLC• Support Up to 4K (4,096×2,304)• Supports up to 59.94 fps• 21 profiles ; 17 levels60

616161

Why is it so good?

• Intra-frame prediction• Integer-based DCT algorithm• Improved de-blocking filter• Flexible block sizes from 4x4 to 16x16 for

improved motion estimation• Improved entropy coding (CABAC)• Multiple Reference Frames

Overview of H.264

(a.k.a. MPEG-4 Part 10)(a.k.a. AVC)

636363

Key H.264 Profiles

• Baseline Profile– All inter and intra coding tools (I and P slices)– CAVLC– Error concealment tools

• Main Profile– Same as Baseline except no ASO, FMO, RS– CABAC– Multiple Reference frames, weighted prediction (B slices)– Macro Block Adaptive Frame/Field (MBAFF) coding– Adaptive Block-size Transform (ABT)

• Constrained Baseline Profile• High Profile (HiP)

– The primary profile for broadcast and disc storage applications, particularly for high-definition television applications (for example, this is the profile adopted by the Blu-ray Disc storage format and the DVB HDTV broadcast service).

6464

More H.264 Profiles

• Extended Profile– Data Partitioning– Switching I and P slices

• High 10 Profile (Hi10P)– Going beyond typical mainstream consumer product capabilities, this profile

builds on top of the High Profile, adding support for up to 10 bits per sample of decoded picture precision.

• High 4:2:2 Profile (Hi422P)– Primarily targeting professional applications that use interlaced video, this profile

builds on top of the High 10 Profile, adding support for the 4:2:2 chromasubsampling format while using up to 10 bits per sample of decoded picture precision.

• High 4:4:4 Predictive Profile (Hi444PP)– This profile builds on top of the High 4:2:2 Profile, supporting up to 4:4:4 chroma

sampling, up to 14 bits per sample, and additionally supporting efficient lossless region coding and the coding of each picture as three separate color planes.

6565

Even More H.264 Profiles

• High 10 Intra Profile– The High 10 Profile constrained to all-Intra use.

• High 4:2:2 Intra Profile– The High 4:2:2 Profile constrained to all-Intra use.

• High 4:4:4 Intra Profile– The High 4:4:4 Profile constrained to all-Intra use.

• CAVLC 4:4:4 Intra Profile– The High 4:4:4 Profile constrained to all-Intra use and to CAVLC entropy coding (i.e., not supporting CABAC).

• Scalable Baseline Profile– Primarily targeting video conferencing, mobile, and surveillance applications, this profile builds on top of a constrained version of

the H.264/AVC Baseline profile to which the base layer (a subset of the bitstream) must conform. For the scalability tools, a subset of the available tools is enabled.

• Scalable High Profile– Primarily targeting broadcast and streaming applications, this profile builds on top of the H.264/AVC High Profile to which the base

layer must conform.• Scalable High Intra Profile

– Primarily targeting production applications, this profile is the Scalable High Profile constrained to all-Intra use.• Stereo High Profile

– This profile targets two-view stereoscopic 3D video and combines the tools of the High profile with the inter-view prediction capabilities of the MVC extension.

• Multiview High Profile– This profile supports two or more views using both inter-picture (temporal) and MVC inter-view prediction, but does not support

field pictures and macroblock-adaptive frame-field coding.

Overview of JPEG-2000

Wavelet based image compression

6767

JPEG-2000 System

6868

Discrete Wavelet Transform

• Captures both frequency and location information

• Effectively a series of filters and down sampling stages

• Reversible Transform for lossless compression• Irreversible Transform for lossy compression

6969

Discrete Wavelet Transform

7070

2D Discrete Wavelet Transform

7171

3-Stage DWT

Stage 1 Stage 2 Stage 3

7272

Quantization

73

Wavelet Encoding Methods

• Embedded Zero-Tree Wavelet (EZW)– The bitstream is embedded and the coefficients are ordered in significance and

precision, so that it can be truncated according to the bit-rate requirements of the channel.

– It efficiently utilizes the self-similarity between the subbands of similar orientation and achieves significant data reduction.

– However, the EZW algorithm is not exactly optimal.• Set Partitioning in Hierarchical Trees (SPIHT)

– Improved performance over EZW, because of its ability to exploit the grouping of insignificant coefficients.

• Embedded Block Coding with Optimized Truncation of bit-stream (EBCOT)– Offers both resolution scalability and SNR scalability. – Because of its advantages, the EBCOT algorithm has been accepted

incorporated within the most recent still image compression standard JPEG-2000.

7474

Embedded Block Coding

• Embedded Block Coding with Optimized Truncation (EBCOT)• Divided into two layers called Tiers.

– Tier 1 is responsible for source modeling and entropy coding– Tier 2 generates the output stream

• Wavelet sub-bands are partitioned into small code-blocks (to not be confused with tiles), typically 64x64

• Code blocks are in turn are separated into bit-planes. (i.e. coefficient bits of the same order are coded together)

• Bit-planes are further partitioned into three coding passes. Each coding pass constitutes an atomic code unit, called chunk.

• Bit-planes are coded using Arithmetic coding (MQ-coder).

7575

Embedded Block Coding

767676

MPEG Components

777777

Levels of Video Compression

• Uncompressed– 166 mbps

• True Lossless Compression– 45 mbps (4)

• Visually Lossless– 3-10 mbps (16-55)

• High Quality Videoconferencing– 384 Kbps to 1.5 mbps (110-432)

• Acceptable Quality Videoconferencing– 112 Kbps to 256 Kbps (648-1482)

• Videophone– 20 Kbps to 33.6 Kbps (4940-8300)

787878

Comparison of Video Coding Standards

H.261 H.263 MPEG1 MPEG2

General Standard Structure Narrow Profile Narrow Profile Generic Tool Kit

Picture Format176 x 144 (mandatory)

352 x 28 (optional)SQCIF, QCIF, CIF, 4CIF, 16CIF 360 x 240 Field or Frame

Quantization Precision of Motion Vectors One Pixel One Half Pixel One Half or One Pixel

B-Frames/PB Frame None PB FramesB Frames

An Available Tool

Intraframe Coding Usually Distributed Flexible Full Frame is Mandatory

Color Coding 4:2:0 4:2:0 4:4:4, 4:2:2, 4:2:0

Picture Structure Group of Blocks GOB Slice

Dual Prime (a special motion compensation mode)

No No No Yes

Nominal Bitrate 56 Kbps to 1,936 Kbps Low Bit Rate 1.5 mbps 4 – 20 mbps

Applications

Interactive Audiovisual Services

-Videophone-- Videoteleconferencing

VTC/Videophone via PSTN/Mobile Network - VCR

- Broadcast TV- + Contribution

+ Distribution- DBS- HDTV

797979

Video Coding StandardsITU ISO

H.261

S 1990S CIF 352 x 288 pixelsS VHS qualityS Videoconferencing/ Videophone

(VC/VPO) via N-ISDN; 56 Kbps- 1.920 mbps

MPEG1S 1993S SIF 352 x 240 pixelsS VHS qualityS Star application - CDROM - 1.5 bps

H.262 S 1995S VC/VP via B-ISDN; 2 - 20 mbps MPEG2S 1993S SIF 352 x 240 pixelsS VHS qualityS Star application - CDROM - 1.5 mbps

H.263S March 1996S SQCIF, QCIF, CIF, 4CIF, 16CIFS H.324, H.323, H.324MS Higher quality than H.261

MPEG4- Video

MPEG-4 will be interoperable with baseline H.263

H.263+S January 1998S 12 optional H.263 ExtensionS New FunctionalitiesS Improved quality, error robustness

Version 1 - 12/98Version 2 - 12/99 S Object based coding

H.263++ SFuture options frame-based

H.264S March 2003S Twice the compression of MPEG-2S More efficient, better tools

MPEG4 Part 10

-Integer DCT- 1/16 pixel motion estimation-Variable motion block shape

808080

Native Video ResolutionsTYPE NAME ACTIVE

PIXELSSTRUCTURE FRAME RATES

SD NTSC – 480i 720 x 480 Interlaced 30 Fps

PAL – 576i 720 x 576 Interlaced 25 Fps

ED NTSC – 480p 720 x 480 Progressive 30 Fps, 60 Fps

PAL – 576p 720 x 576 Progressive 25 Fps, 50 Fps

HD 720p 1280 x 720 Progressive 24, 25, 30, 48, 50, 60 FPS

1080i 1920 x 1080 Interlaced 24, 25, 30 FPS

1080p 1920 x 1080 Progressive 24, 25, 30, 48, 50, 60 FPS

818181

Common Scaled Video Resolutions

SubQCIF 128 x 96 single fieldQCIF 176 x 144 single fieldCIF 352 x 288 single field

2xCIF 704 x 288 single field4xCIF 704 x 576 pseudo progressive

16xCIF 1408 x 1152 pseudo progressiveSIF 360 x 240 single field

2xSIF 720 x 240 single field½ D1 360 x 480 interlaced

D1 720 x 480 interlaced

ITU-T H.265 Features

• 40-50% of the bit rate at the same visual quality compared to H.264• Enhanced Hybrid spatial-temporal prediction model• Flexible partitioning, introduces Coding Tree Units (Coding,

Prediction and Transform Units -CU, PU, TU) and larger block structure (64x64) with more variable sub partition structures

• 35 directional modes for intra prediction• Superior parallel processing architecture, enhancements in multi-

view coding extension• Up to 8K UHDTV (8192×4320)• 3 approved profiles, draft for additional 5 ; 13 levels

82

H.265

• Replaces 16x16 Macroblocks with 64x64 Coding Tree Units (CTU)

• Increases Intra Prediction modes from 9 to 35• Adds 16x16 and 32x32 to the existing 8x8 and

4x4 Integer DCT Transform• Decode independent Tiles for increased

parallelism

83

An Introduction to Audio Compression

AGENDA

• Audio Codecs• Types of Audio Codecs• Waveform Coders• Source Coders• Silence Compression• Echo Cancellation

2

Classes of Audio Codecs

• Speech Codecs– Optimized for human speech– Single channel– Low sample rates– Limited audio bandwidth

• Audio Codecs– Optimized for full fidelity audio (music, sound effects, etc)– Multiple channels– High sample rates– High audio bandwidth

3

Speech Codecs

AGC and

FilteringSampling CodingAnalog Voice

Coded Voice

4

Types of Speech Codecs

• Waveform Coding– Speech waveform is digitized, coded, and

transmitted.• Source Coding

– Speech source is modeled by an excitation and a filter. Characteristics of the excitation and filter are coded and transmitted.

5

Waveform Codecs

• G.711 A-Law and u-Law PCM• G.722 ADPCM

6

G.711 PCM

• A-Law or u-Law Companding• 8 Ksps = 3.5 KHz bandwidth• Converts 16-bit sample to 13-bits by

truncation.• Converts 13-bit sample to 8-bits by

logarithmic translation.• Preserves dynamic range• Compressed data rate: 64kbps7

A-Law Companding

-128

-112

-96

-80

-64

-48

-32

-160

16

32

48

64

80

96

112

128

-0,5x40960

0,5x4096

-4096 +4096

Compressed Samples

Linear Samples

8

G.722 SubBand ADPCM

• Adaptive Differential PCM• 16 Ksps = 7 KHz Bandwidth• Filters signal into a high band and a low

band.• Applies ADPCM to each band.• Higher bitrate allocated to low band• Compressed data rate: 64 Kbps9

ADPCM

• Previous samples form a prediction of the next samples.

• Take the difference between the prediction and the actual speech.

• Code the difference.• Adapt the prediction as the speech

changes.10

Block Diagram of the G.722 Encoder

11

Source Codecs

• G.728 LD-CELP• G.729 ACELP• G.723.1 MPE/MLQ and ACELP

12

Linear Prediction Coding

• Model human speech by using an excitation signal passed through a filter.

• Match the model to the input speech using analysis by synthesis.

• Only transmit info about the excitation signal and characteristics of the filter.

13

Linear Prediction Coding

14

Linear Prediction Diagram

15

G.728 LD-CELP

• Low Delay – Code Excited Linear Prediction

• 8 Ksps = 3.5 KHz Bandwidth• Find codebook excitation and filter that

best matches signal.• Code and transmit codebook index, gain,

and filter coefficients.• Compressed data rate: 16 Kbps16

G.729 CS-ACELP

• Conjugate-Structure Algebraic Code Excited Linear Predictor

• 8 Ksps = 3.5 KHz bandwidth• Uses Conjugate Structured codebook

which simplifies the signal match search• Compressed data rate: 8 Kbps

17

G.723.1 MP-MLQ/ACELP

• Multipulse – Maximum Likelihood Quantization

• Algebraic Code Excited Linear Prediction• 8 Ksps = 3.5 KHz bandwidth• MP-MLQ = 6.3 Kbps• ACELP = 5.3 Kbps

18

Silence Compression

• Used in G.723.1 and G.729• Reduces bitrate during silence periods• Voice Activity Detection (VAD)

– Spectral analysis: presence or absence of speech– Operates on 30 ms speech frames which are

encoded or filled with comfort noise• Comfort Noise Generator (CNG)

– Generates artificial background noise that matches the actual background noise

19

Acoustic Echo Cancellation

20

Types of Audio Codecs

• Sub-band Coding– Audio signal is applied to a series of filter banks which are then

entropy encoded.• Modified Discrete Cosine Transform (MDCT)

– DCT performed on overlapping blocks data where the last half of the previous block is the first half of the next block.

• Sub-band MDCT– Audio signal is prefiltered with a 32-band polyphase quadrature

filter (PQF) bank before the MDCT is applied.

21

Audio Codec

22

AGC and

FilteringSampling CodingAnalog Audio

Coded Audio

Frequency Domain

Conversion

Perceptual Quantization

Audio Channels

• Monaural– Single channel

• Stereo– Two channels

• Surround Sound– 5.1 (5 channels plus 1 Low Frequency

Enhancement)23

Perceptual Quantization

• Removes information that the human auditory system will not be able to easily perceive.

• To choose which information to remove, the audio signal is analyzed according to a psychoacoustic model, which takes into account the parameters of the human auditory system.

• Research into psychoacoustics has shown that if there is a strong signal on a certain frequency, then weaker signals at frequencies close to the strong signal's frequency cannot be perceived by the human auditory system. This is called frequency masking.

• By ignoring information at frequencies that are deemed to be imperceptible, more data can be allocated to the reproduction of perceptible frequencies.

24

MPEG-1/Layer 2 (MP2)

• Primarily used for Broadcast Audio• Sub-band

– Splits the input audio signal into 32 sub-bands using sub-band filter banks, and if the audio in a sub-band is deemed to be imperceptible then that sub-band is not transmitted.

• MPEG-1 Audio Layer II– Sampling rates: 32, 44.1 and 48 kHz– Bit rates: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s– Up to two audio channels

• MPEG-2 Audio Layer II– Sampling rates: 16, 22.05 and 24 kHz– Bit rates: 8, 16, 24, 40 and 144 kbit/s– Multichannel support - up to 5 full range audio channels and an LFE-channel

(Low Frequency Enhancement channel)25

MPEG-1/Layer 3 (MP3)

• Primarily used for PC and Internet audio• MDCT/Hybrid Sub-band

– Transforms the input audio signal into 576 frequency components in the frequency domain using the MDCT allowing finer application of the perceptual filtering

• MPEG-1 Audio Layer III– Bitrate: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s– Sample Rate:32, 44.1 and 48 kHz.

• MPEG-2 Audio Layer III– Bitrate: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kbit/s – Sample Rate: 16, 22.05 and 24 kHz

• MPEG-2.5 Audio Layer III – Bitrate: 8, 16, 24, 32, 40, 48, 56 and 64 kbit/s – Sample Rate: 8, 11.025, and 12 kHz26

Advanced Audio Coding (AAC)

• Primarily used for YouTube, iPhone, iTunes, Nintendo• Improvement over MP3• MDCT/Hybrid Sub-band

– The signal is converted from time-domain to frequency-domain using forward modified discrete cosine transform (MDCT).

– The frequency domain signal is quantized based on a psychoacoustic model and encoded.

• Sample rates: from 8 to 96 kHz• Bit Rate: Arbitrary (2kbps to 320kbps)• Up to 48 channels

27

282828

292929

Contact Information


Ph: 215-657-5270FAX: 215-657-5273


29

303030

Backup Slides

Typical Audio Quality Problems

• Too many errors– Packet loss

• Too much delay– Congestion– Too many router hops– QOS parameters

31

Audio Quality• Packet Delay

– Time for voice packets to transit from the sender to the receiver– If too large, makes conversation difficult

• Packet Jitter– Variation in packet delay– If too large, requires larger buffering, increasing delay or causing buffer

underflow or overflow• Packet Loss

– Packets are dropped due to congestion, poor cabling, equipment malfunction, duplex mismatch, bit errors, CRC errors

– Since there is no retransmission, results in lost information– Clicks, chirps, hum in audio

32

Audio Quality• One Way Delay

– Good (0ms-150ms)– Acceptable (150ms-300ms)– Unacceptable (> 300ms)

• Jitter– Good (0-20ms)– Acceptable (20ms-50ms)– Unacceptable (>50ms)

• Packet Loss– Good(0%-0.5%)– Acceptable (0.5%-1.5%)– Unacceptable (>1.5%) 33

Blank Page

an introduction to video compression...video interface • converts the camera’s pixel array data...

Documents