an introduction to video compression...video interface • converts the camera’s pixel array data...

40
An Introduction to Video Compression

Upload: others

Post on 25-Jan-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

  • An In

    trodu

    ctio

    n to

    Vid

    eo

    Com

    pres

    sion

  • An Introduction to Video Compression

    2

    AGENDA

    • Video Basics– Analog Video– Digital Video– Scanning Formats

    • Video Compression– Intra-Frame Coding– Inter-Frame Coding

    • Video Quality• Video Coding Standards• Audio Compression

    3

    Video Basics

    Typical Camera

    4

    Image Sensor

    Image Processor

    Video Interface

    Image Sensors (CCD/CMOS)

    • Each cell is a monochrome detector

    • A filter array is used to make them sensitive to different color frequencies

    • More green cells are used than red and blue cells because human vision is more sensitive to green frequencies

    • This is called a Bayer pattern

    5

    Image Processing

    • Image processing internal to the camera converts the Bayer pattern to the desired camera output format

    • Demosaicing and Interpolating algorithms are used to generate the output pixels

    • Typical outputs include RGB and YCrCb

    6

  • Video Interface

    • Converts the camera’s pixel array data to an output signal

    • Horizontal/Vertical scanning of image array• Parallel to Serial conversion of pixel data• Synchronization, timing and status insertion• Electrical signal generation

    7

    88

    Analog Composite Video Waveform

    99

    Analog Video Image

    10

    Analog Composite Video Formats• National Television Standard Committee (NTSC)

    – Interlaced: 2 fields/frame– 525 lines/frame, 262.5 lines/field– 29.97 frames/second, 59.94 fields/second– 63.5 uSeconds/line– Color burst: 3.58 MHz

    • Phase Alternate Line (PAL)– Interlaced: 2 fields/frame– 625 lines/frame, 312.5 lines/field– 25 frames/second, 50 fields/second– 64 uSeconds/line– Color burst: 4.43 MHz

    • Séquentiel couleur à mémoire (SECAM)

    1111

    Digitized Composite Video Formats• NTSC

    – 720 horizontal x 480 vertical pixels– 2:1 Interlaced– 29.97 Frames per Second (59.94 Fields per Second)– 165.888 M Bits per Second

    • PAL– 720 horizontal x 576 vertical pixels– 2:1 Interlaced– 25 Frames per Second (50 Fields per Second)– 165.888 M Bits per Second

    1212

    Other Analog Formats• S-Video

    – Y – Luminance Signal– C – Chrominance Signal– Eliminates need for color separation filtering– Improved quality over Composite Video

    • YPrPb Component– Y – Luminance Signal– Pr – Red-Green Color Difference Signal– PB – Blue-Green Color Difference Signal– Supports High Definition Resolutions

    • RGB Component– Red, Green, and Blue Signals– Separate Sync signal– Sync on Green– Primarily a computer monitor format

  • Digital Broadcast Interfaces

    • Serial Digital Interface– SMPTE 259 SDI (270Mbps)

    • 480i30 (720x480)• 576i25 (720x576)

    – SMPTE 292 HD-SDI (1.485Gbps, 1.485/1.001Gbps)• 720p60 (1080x720)• 1080i30 (1920x1080)• 1080p30 (1920x1080)

    – SMPTE 424 3G-SDI (2.970Gbps, 2.970/1.001Gbps)• 1080p60 (1920x1080)

    13

    Serial Digital Interface Line

    14

    Serial Digital Interface Frame

    15

    New Digital Broadcast Interfaces

    • New Serial Digital Interfaces– SMPTE 2081 6G-SDI (5.94Gbps)

    • 4Kp30 (3840x2160)– SMPTE 2082 12G-SDI (11.88Gbps)

    • 4Kp60 (3840x2160)– SMPTE 2083 24G-SDI (23.76Gbps)

    • 8Kp30 (7680x4320)

    16

    Broadcast Interface• Fixed resolutions and

    frame rates• Unidirectional interface• No control channel• Fire and forget• Format indicated within

    the signal• Coax cable• No trigger capability• No camera power

    Machine Vision Interface• Random resolutions and

    frame rates• Bidirectional interface• Control channel• Often requires handshake• Format configured in

    control channel• Most use complex cables• Triggers• Camera power

    17

    Machine Vision Interfaces

    • GigE Vision– Uncompressed video over UDP on 1-Gigabit and 10-Gigabit

    Ethernet networks– Cat5 cabling, Ethernet infrastructure– 125MB/s (1250MB/s) transfer rate

    • USB3 Vision– Same API as GigE Vision over USB 3.0 connections

    18

  • Machine Vision Interfaces (Cont)

    • CoaXPress– 6.25 Gbps over coax cable– Group multiple cables (4 cables = 25 Gbps)– Future expansion to 10 and 12.5 Gbps

    • CameraLink– 26-pin Mini-D ribbon connector– LVDS signal pairs, 2.04 Gbps– Group multiple cables (2 cables = 4.08 Gbps)– Video data, discrete signals, control channel

    19

    Machine Vision Interfaces (Cont)

    • IEEE-1394 FireWire– Tree bus topology, bus master negotiations

    and self identification– 400 and 800 Mbps (1600 and 3200 Mbps

    defined but not widely supported).– 4, 6 and 9 conductor connectors

    • DVI/HDMI– 19 pin connector– 48 Gbps data rate– I2C Control channel– Copy Protection negotiation

    20

    2121

    Video Resolutions

    22

    Interlaced Scanning Format1

    264

    2

    265

    3

    524

    262

    525

    263 Even fieldscan line

    Odd fieldscan line

    Retrace

    23

    Progressive Scanning Format1

    2

    3

    4

    5

    522

    523

    524

    525

    Scan line

    Retrace

    Interlace vs Progressive

    24

  • Interlace vs Progressive

    25

    Color Spaces

    • Monochrome• Red, Green, Blue (RGB)

    – Separate sync, Sync on Green• YIQ (NTSC, PAL)• YUV• YPbPr (Analog), YCbCr (Digital)

    – Pb/Cb = Blue - Luma– Pb/Cr = Red - Luma

    • HSV, HSB, HSL• CMYK

    26

    Other Characteristics

    • Pixel Bit Depth– 8, 10, 12, 14, 16 bits per component

    • Chroma Sampling– 4:4:4, 4:2:2, 4:2:0, 4:0:0

    27

    2828

    Video Compression

    2929

    The need for Video Compression• Digital Transmission.

    – Need to match video data rate to digital communications system bandwidth.

    • Digital Storage.– Need to match video data rate to digital storage

    system bandwidth.– Need to reduce storage capacity or increase storage

    time.• Multiplexing

    – Send more programs or other data over the same channel.

    Video Data Rates

    • Uncompressed SD Video– 720x480, 30 fps, 16 bpp

    • 166 Mbps

    • Uncompressed HD Video– 1920x1080, 60 fps, 16 bpp

    • 1,990 Mbps

    • Uncompressed UHD Video– 7680x4320, 120 fps, 16 bpp

    • 63,701 Mbps

    30

  • 3131

    Digital Transmission Rates• Transmission System Data Rates

    – Ethernet: 10/100 Mbps, 1/10 Gbps– OC12: 622 Mbps– OC3: 155 Mbps– DS3: 45 Mbps– T1: 1.544 Mbps– DSO: 64 Kbps– Modem: 33.6 Kbps– Cellular: 9600

    3232

    Digital Storage Capacity• Uncompressed SD Video: 166 Mbps

    – 1 Min. Clip = 1.24 G Bytes– 30 Min. TV Show = 37.3 G Bytes– 2 Hour Movie = 149.3 G Bytes

    • Uncompressed HD Video: 1,990 Mbps– 1 Min. Clip = 14.9 G Bytes– 30 Min. TV Show = 447.8 G Bytes– 2 Hour Movie = 1.79 T Bytes

    • Uncompressed UHD Video: 63,701 Mbps– 1 Min. Clip = 477.8 G Bytes– 30 Min. TV Show = 14.3 T Bytes– 2 Hour Movie = 57.3 T Bytes

    33

    Digital Storage Devices

    • CD– 185 to 870 M Bytes (various formats)

    • DVD– 1.46 to 17.08 G Bytes (SS, SD, SL, DL)

    • BluRay Disk– 25 G Bytes (50 GB for dual layer disks)

    • Disk Drives– > 1 T Byte

    3434

    Types of Compression• Lossless

    – Output image is numerically identical to the original image on a pixel-by-pixel basis.

    – Only statistical redundancy is reduced– Compression ratio is usually low – 2:1 to 4:1– Reversible (infinite compress/decompress cycles)

    • Lossy– Output image is numerically degraded relative to the

    original.– Statistical and perceptual redundancy is reduced– High compression due to reduction of perceptual

    redundancy– Can be visually lossless– Irreversible (compress/decompress degrades images)

    3535

    Video Compression Standards

    • Functions Standardized– Bit stream syntax– How the decoder interprets the bit

    stream

    • Functions Not Standardized– Pre-processing

    • Filtering, scaling, noise reduction– Encoding strategy

    • Mode Selection• Quantizer Selection• Block Pattern Selection• Motion Estimation

    – Post-filter• Scaling, block filtering, error

    concealment

    PRE-PROCESS- Noise Filter

    ENCODE

    POST-PROCESS

    - SignalEnhancement

    - Filter- Error

    Concealment

    DECODEBIT-

    STREAM- Scaling

    36

    Preprocessing (1/5)

    • Noise Reduction– Filter out high frequency information and

    improves compression efficiency– Spatial Noise Reduction

    • Reduces high frequency information within a picture

    – Temporal Noise Reduction• Reduces high frequency information between

    frames

  • Preprocessing (2/5)

    • Resolution Reduction– Reduces the amount of

    spatial information that needs to be compressed

    – Scaling• Maintains field of view

    37

    Preprocessing (3/5)

    • Resolution Reduction– Reduces the amount of

    spatial information that needs to be compressed

    – Cropping• Maintains pixel

    resolution38

    Preprocessing (4/5)

    • Frame Rate Reduction– Reduces the amount of temporal information

    that needs to be compressed

    3939

    1 2 3 4 5 6

    Video Sequence – 30 Frames per Second

    1 3 5X X X

    Video Sequence – 15 Frames per Second

    16 Bits/Pixel

    Y Cr Cb

    4:4:4 4:2:2 4:2:0

    24 Bits/Pixel

    12 Bits/Pixel

    • Color representation– Reduces the amount of spatial information

    that needs to be compressed

    Preprocessing (5/5)

    4040

    4141

    Intra-Frame Coding

    • Removes the redundancies within a frame• Each frame is encoded separately without

    regard to adjacent frames• Similar to JPEG

    Encoding Stages

    • Signal Analysis– Transform from spatial domain to another

    domain that is easier to compress• Quantization

    – Discard less important information• Variable Length Coding

    – Last stage of efficient coding42

  • 4343

    Video Compression SystemSignal

    Analysis

    INTRAFRAME- Predictive+ 1D, 2D predictor+ Fixed, adaptive

    - Block transform+ DCT+ Karhunen Loeve+ Hadamard+ Fourier+ Haar

    - Sub-band filtering+ Unconstrained

    rectangular blocks+ QMF, other filters+ Wavelets+ Pyramidal

    - Block patternmatching (VQ)

    - Fractals

    INTERFRAME- Application of

    intraframe techniques- Motion estimation- 3-D transform

    Quantization

    - Variable precision

    - Visibility matrix

    - Fixed, adaptive

    - Lossy, lossless

    - Scalar, vector

    Variable length coding(entropy coding)

    - Fixed+ Huffman+ B+ Comma+ 2 dimensional

    (run length/COEFF)

    - Adaptive+ Conditional+ Arithmetic+ One, two passes

    - Lossless

    - Predictionerror

    - Transformcoefficients

    - Filter energylevels

    COMPRESSEDOUTPUT

    Uncompressedinpute.g. 8 bits/pixel

    4444

    Signal Analysis Phase

    SignalAnalysis

    INTRAFRAME- Predictive

    + 1D, 2D predictor+ Fixed, adaptive

    - Block transform+ DCT+ Karhunen Loeve+ Hadamard+ Fourier+ Haar

    - Sub-band filtering+ Unconstrained

    rectangular blocks+ QMF, other filters+ Wavelets+ Pyramidal

    - Block patternmatching (VQ)

    - Fractals

    INTERFRAME- Application of

    intraframe techniques- Motion estimation- 3-D transform

    Quantization

    - Variable precision

    - Visibility matrix

    - Fixed, adaptive

    - Lossy, lossless

    - Scalar, vector

    Variable length coding(entropy coding)

    - Fixed+ Huffman+ B+ Comma+ 2 dimensional

    (run length/COEFF)

    - Adaptive+ Conditional+ Arithmetic+ One, two passes

    - Lossless

    - Predictionerror

    - Transformcoefficients

    - Filter energylevels

    COMPRESSEDOUTPUT

    Uncompressedinpute.g. 8 bits/pixel

    4545

    Discrete Cosine Transform• Converts spatial information to frequency

    information• Important information is concentrated in

    lower frequency bands• Operations typically performed on 8x8

    blocks of pixels• Lossless and reversible

    4646

    Transform Coding by DiscreteCosine Transform (DCT)

    DCT

    Segmentation of a frameinto blocks of pixels

    8x8 Pixels8x8

    Coefficients

    4747

    DCT Basis Function

    4848

    Effect of DCT Coefficients

  • 4949

    More Effects

    5050

    Coefficient Levels

    5151

    Quantization Phase

    SignalAnalysis

    INTRAFRAME- Predictive+ 1D, 2D predictor+ Fixed, adaptive

    - Block transform+ DCT+ Karhunen Loeve+ Hadamard+ Fourier+ Haar

    - Sub-band filtering+ Unconstrained

    rectangular blocks+ QMF, other filters+ Wavelets+ Pyramidal

    - Block patternmatching (VQ)

    - Fractals

    INTERFRAME- Application of

    intraframe techniques- Motion estimation- 3-D transform

    Quantization

    - Variable precision

    - Visibility matrix

    - Fixed, adaptive

    - Lossy, lossless

    - Scalar, vector

    Variable length coding(entropy coding)

    - Fixed+ Huffman+ B+ Comma+ 2 dimensional

    (run length/COEFF)

    - Adaptive+ Conditional+ Arithmetic+ One, two passes

    - Lossless

    - Predictionerror

    - Transformcoefficients

    - Filter energylevels

    COMPRESSEDOUTPUT

    Uncompressedinpute.g. 8 bits/pixel

    5252

    Quantization

    • Converts 12 bit DCT coefficients to 8 bit values

    • Try to force less important information toward zero values

    • Takes less bits to code zero• Most of loss occurs here

    5353

    Scalar Quantization

    4

    3

    2

    256 Input0

    1

    Output

    -1

    -2

    -3

    -4

    512 768 1024

    -1024 -768 -512 -256

    803

    Quantize 803 >> 3Inv Quant 3 >> 768Quantization Error 803 – 768 = 35

    5454

    Scanning Order in a Block

    1 2 6 7 15 16 28 293 5 8 14 17 27 30 434 9 13 18 26 31 42 4410 12 19 25 32 41 45 5411 20 24 33 40 46 53 5521 23 34 39 47 52 56 6122 35 38 48 51 57 60 6236 37 49 50 58 59 63 64

  • 5555

    Entropy Coding Phase

    SignalAnalysis

    INTRAFRAME- Predictive

    + 1D, 2D predictor+ Fixed, adaptive

    - Block transform+ DCT+ Karhunen Loeve+ Hadamard+ Fourier+ Haar

    - Sub-band filtering+ Unconstrained

    rectangular blocks+ QMF, other filters+ Wavelets+ Pyramidal

    - Block patternmatching (VQ)

    - Fractals

    INTERFRAME- Application of

    intraframe techniques- Motion estimation- 3-D transform

    Quantization

    - Variable precision

    - Visibility matrix

    - Fixed, adaptive

    - Lossy, lossless

    - Scalar, vector

    Variable length coding(entropy coding)

    - Fixed+ Huffman+ B+ Comma+ 2 dimensional

    (run length/COEFF)

    - Adaptive+ Conditional+ Arithmetic+ One, two passes

    - Lossless

    - Predictionerror

    - Transformcoefficients

    - Filter energylevels

    COMPRESSEDOUTPUT

    Uncompressedinpute.g. 8 bits/pixel

    5656

    Variable Length Coding• A technique for assigning shorter codes to

    high probability symbols, and longer codes to lower probability symbols

    • Entropy Coding, Huffman Encoding, Arithmetic Coding

    • Modest compression gain (2:1-4:1)• Lossless• Reversible

    57

    VLC Examples

    5858

    Huffman Variable Length Coding

    Symbol Probability VLC Code Fixed Length Code

    A ½ 0 00

    B ¼ 10 01

    C 1/8 110 10

    D 1/8 111 11

    For 128 symbols: Fixed Length = 256 bits

    Variable Length = 224 bits (12.5% savings)

    59

    Sample Intra Block Coding

    688 -21 0 0 0 0 0 0-39 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

    e) INVERSE QUANTIZED COEFFICIENTS

    76 76 77 79 80 81 82 8377 77 78 80 81 82 83 8479 79 80 81 83 84 85 8681 82 83 84 85 87 88 8884 84 85 87 88 89 90 9186 87 88 89 91 92 93 9388 89 90 91 92 94 95 9589 90 91 92 93 95 96 96

    f) RECONSTITUTED BLOCK

    684 -19 -1 -2 0 -1 0 -1-37 0 -1 0 0 0 0 -10 0 0 0 0 0 0 0-4 -1 -1 -1 -1 0 -1 -10 0 0 0 0 0 0 0-2 0 0 -1 0 -1 0 -10 0 0 0 -1 -1 -1 -1-1 -1 -1 0 -1 0 -1 0

    b) TRANSFORM COEFFICIENTS

    RUN LEVEL CODE86 01010110-3 001011-6 001000011

    EOB 10TOTAL CODE LENGTH = 25

    86 -3 0 0 0 0 0 0-6 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

    c) QUANTIZED COEFFICIENTS d) VARIABLE LENGTH CODING

    7775 76

    78

    a) ORIGINAL PIXELS

    77 78 79 80 81 8279 80 81 82 83 84

    79 80 81 82 83 84 85 8681 82 83 84 85 86 87 8883 84 85 86 87 88 89 9085 86 87 88 89 90 91 9287 88 89 90 91 92 93 9489 90 91 92 93 94 95 96DCT

    Quantize

    InverseDCT

    InverseQuantize

    Variable Length Code

    60

    Inter-Frame Coding• Removes the redundancies between

    frames• Uses one or more previous or future

    frames as a prediction of the current frame• Produces a difference between the current

    frame and the prediction• Only encode the differences

  • Encoding Stages

    • Signal Analysis– Transform from spatial domain to another domain that is easier

    to compress• Quantization

    – Discard less important information• Variable Length Coding

    – Last stage of efficient coding• Prediction

    – Create a prediction for the frame being encoded and only compress the difference

    61

    6262

    Generic Inter-Frame Coding System

    Predictor

    DCT and Quantizer

    Inverse DCTand Quantizer

    +

    -Prediction

    Error

    PredictedSignal

    InputVideo

    OutputData

    63

    Original Image Histograms

    64

    Difference Image

    6565

    Difference Image Histograms

    6666

    Pictures Types

    • Intra Frame (I)– Full picture coded independent of other frames– Key frame

    • Predicted Frame (P)– Inter coded frame predicted from a previous frame

    • Bidirectional Frame (B)– Inter coded frame predicted from a future frame

    • GOP– # of P/B frames between I frames– # of B frames between I/P frames– Open or Closed GOP

  • 6767

    Group of Pictures (GOP)

    Transmission Order: IPBBPBB

    6868

    Picture Type Efficiency

    • I Frames: 1 bit per pixel• P Frames: 0.25 bits per pixel• B Frames: 0.03 bits per pixel• All I Frames: 10Mbps• IP15: 6 Mbps• IPB15: 2.5 Mbps

    6969

    Improving Prediction

    • Objects in a scene do not usually change instantaneously– Objects move– Camera pans and zooms

    • Improve prediction by accounting for the motion of blocks

    • Extremely computation intensive

    Improved Prediction with Motion Vectors

    70

    7171

    Motion Vector Example

    7272

    Block Sizes• H.261/H.263 use 16x16 block for motion

    estimation• H.264 uses variable block sizes

    0

    Sub-macroblock partitions

    0

    1 0 1

    0 1

    2 3

    0 0

    1 0 1

    0

    2

    1

    3

    1 macroblock partition of 16*16 luma samples and

    associated chroma samples

    Macroblock partitions

    2 macroblock partitions of 16*8 luma samples and

    associated chroma samples 4 sub-macroblocks of 8*8 luma samples and

    associated chroma samples 2 macroblock partitions of

    8*16 luma samples and associated chroma samples

    1 sub-macroblock partition of 8*8 luma samples and

    associated chroma samples 2 sub-macroblock partitions

    of 8*4 luma samples and associated chroma samples

    4 sub-macroblock partitions of 4*4 luma samples and

    associated chroma samples 2 sub-macroblock partitions of 4*8 luma samples and

    associated chroma samples

  • 7373

    Block Size Example

    7474

    Intra-Frame Prediction• 4x4 intra prediction modes

    • Also 16x16 intra prediction modes

    7575

    Performance

    76

    Video Quality

    77

    Rate Control Algorithm (1/2)

    • This is the encoder magic sauce• Determines how to code each macro-block• Rate Control Modes

    – Constant Quality (CQ) – Attempt to product a consistent video quality regardless of the data rate required

    – Constant Bit Rate (CBR) – Attempt to product the best quality video while holding the data rate constant. May use fill data to maintain the constant data rate. May still be some minor variability in data rate.

    – Variable Bit Rate (VBR) – Attempt to produce the best quality video while not exceeding some maximum data rate, but allowing the data rate to drop when not needed

    Rate Control Algorithm (2/2)

    • For VBR and CBR, encoder attempts to use the finest quantizer (sharpest quality pictures)

    • If the encoder is generating too much data for the selected data rate, the encoder increases the coarseness of the quantizer up to the maximum user selected value (picture quality degrades)

    • If the encoder is still generating too much data for the selected data rate, the encoder begins dropping frames (reduced frame rate)

    78

  • 79

    Encoder Trade-offs

    • Multidimensional trade-off space– Data rate– Resolution– Frame Rate– Quality– Latency– Error resilience

    80

    Compression Ratio

    • Old compression algorithms had a fixed bits per pixel and therefore had a defined Compression Ratio.

    • New compression algorithms have a variable number of bits per pixel depending on a number of factors, most notable is picture quality.– 1000:1 compression is very possible, but picture quality will be

    unusable– 100:1 is reasonable

    • Compression efficiency and picture quality are also highly dependent on scene complexity.

    8181

    Typical Video Quality Problems

    • Data rate too low to support desired quality– Communications link bandwidth– Codec efficiency– Resolution and frame rate

    • Too many errors– Bit errors– Packet loss– Burst errors

    82

    Video Quality Effects• Video Frame Rate

    – Reduced Frame Rate– Jerky Video (irregular frame rate)

    • Video Artifacts– Multi-color blocks– Black or white stripes– Tearing or melting images– Ghosts and shadows behind moving objects– Block edges– Edge noise around sharp edges

    • Decoder Issues– Loss of sync, cannot detect start codes– Video freeze– Decoder crash

    8383

    Compression Artifacts - Blockiness

    8484

    Compression Artifacts - Blockiness

  • 8585

    Compression Artifacts - Blurriness

    8686

    Compression Artifacts - Edge Noise

    8787

    Compression Artifacts - Edge Noise

    8888

    Error Artifacts - I-Frame Data

    8989

    Error Artifacts – P/B-Frame Data

    9090

    Error Artifacts – P/B Frame Data

  • 9191

    Packet Loss Artifacts

    1% Packet Loss Rate 5% Packet Loss Rate

    92

    Compressed Data Error Sensitivity

    • Sparse Start Codes – usually at the beginning of a picture

    • Variable length codes – a bit error not only changes the value, but also the length of the code and the following values

    • Inter-frame coding propagates an error over the rest of the GOP

    93

    Error Resilience Features

    • GOP Structure – distance between I-Frames• Intra Refresh – Intra-coded macroblocks

    distributed throughout long GOPs• Slices – partitioning image into sub images with

    re-sync points• Flexible Macroblock Ordering, Arbitrary Slice

    Ordering, Redundant Slices• Forward Error Correction Schemes

    Error Concealment Features

    • Decoder functions to reduce the effect of errors• Intra-frame concealment – use other pixels

    within the frame to predict the value of the missing pixels

    • Inter-frame concealment – use pixels from previous frames to predict the value of the missing pixels

    • Estimation of missing motion vectors94

    Error Resilience Schemes

    • Flexible Macroblock Ordering (FBO)• Arbitrary Slice Ordering (ASO)• Data Partitioning (DP)• Redundant Slices (RS)• SP/SI Frames for bit rate switching• Reference Frame Selection• Intra-block refreshing95

    Video Encoder Settings

    • Video Resolution– Autodetect– Scaling/Cropping

    • Frame Rate Decimation• GOP Size/Structure

    – Infinite GOP=0 (only P-Frames)– GOP=1(only I-Frames)– Normal GOP=2 to N (I and P-frames)

    • Quantizer Settings

    96

  • Video Encoder Settings

    • Slice Modes– Error Recovery– Full Frame– Macroblocks– Bytes

    • Intra-Refresh– Force Intra Macroblocks during P-Frames– Sequential– Random

    97

    Video Encoder Setting

    • Date Rate• Rate Control Mode

    – Constant Quality– Constant Bit Rate– Variable Bit Rate

    • Rate Control Parameters– Not common

    98

    999999

    100100100

    Contact Information

    747 Dresher RoadHORSHAM, PA 19044

    Ph: 215-657-5270FAX: 215-657-5273

    www.deltadigitalvideo.com

    100

    Backup Slides

    101

    102102

    High Definition Formats• 720p

    – 1280 horizontal x 720 vertical pixels– Progressive– 60 Frames per Second– 884.736 M Bits per Second

    • 1080i– 1920 horizontal x 1080 vertical pixels– 2:1 Interlaced– 30 Frames per Second (60 Fields per Second)– 995.328 M Bits per Second

    • 1080p– 1920 horizontal x 1080 vertical pixels– Progressive– 60 Frames per Second– 1990.656 M Bits per Second

  • 103103

    Interlace Artifacts

  • Video Compression Standards

    Standards Organizations

    • In the beginning there was COST 211. They created the H.120 standard for interactive video communications.

    • The ITU-T VCEG (Video Coding Expert Group) was formed to improve upon H.120.

    • MPEG (Moving Pictures Experts Group) was formed to find a way to incorporate codecs for broadcast work.

    • To this day, the two committees – MPEG and VCEG, work side-by-side. MPEG specializes in broadcast (television), while the ITU (International Telecommunications Union) focuses on telecommunications (phone, internet).

    2

    ITU-T H.120

    • Adopted in 1984• 1544 kbit/s for NTSC and 2048 kbit/s for PAL• Used differential pulse-code modulation, scalar

    quantization, variable-length coding • Showed that a better algorithm was needed to produce

    usable quality at reasonable data rates

    3

    444

    ITU-T H.261 Overview

    • Adopted 1988• Motion Video Algorithm typically used for Video

    Conferencing over ISDN networks• Target Data Rate

    – 56 Kbps to 2048 Kbps• Available Resolutions

    – CIF: 352x288– QCIF: 176x144

    55

    ISO/IEC JPEG Overview

    • Adopted in 1993• ITU-T T.81, ISO/IEC 10918-1• Still Image Compression Algorithm• Based on Discrete Cosine Transform (DCT)• Motion video mode – sequence of still images• Lossless and lossy compression modes

    66

    ISO/IEC JPEG-2000 Overview

    • Adopted in 2000• ITU-T T.800, ISO/IEC 15444-1• Still Image Compression Algorithm• Based on Discrete Wavelet Transform (DWT)• Motion video mode – sequence of still images• Lossless and lossy compression modes

  • 77

    JPEG vs JPEG2000

    88

    JPEG vs JPEG-2000(at same bit rate)

    JPEG JPEG-2000

    ISO/IEC MPEG-1

    • Adopted in 1992• ISO/IEC 11172 • Motion Video Compression primarily for CD-ROM video

    storage• Target Data Rate: 1.5 Mbps• Most common resolution was the Source Input Format

    (SIF) – 352x240 (for NTSC)– 352x288 (for PAL)– 320x240 (for VHS)

    9

    ITU-T H.262 / ISO/IEC MPEG-2

    • Adopted in 1996• ITU-T H.262, ISO/IEC 13818• Typical Data Rates

    – 3-5 Mbps for broadcast quality standard definition video– 15-20 Mbps for high definition video

    • Typical Resolutions– SIF: 360x240p– HHR: 360x480i– SD: 720x480i, 720x576i– ED: 720x480p– HD: 1280x720p, 1920x1080i

    • Enabled digital broadcast, cable and satellite television10

    111111

    ITU-T H.263 Overview

    • Adopted in 1999• ITU-T Recommendation H.263• Target Data Rate: 56 kbps• Target Resolution: Sub QCIF 128x96• Designed for video phone over analog telephone lines• Significant improvements in compression efficiency

    121212

    Algorithm Comparison

  • 131313

    ISO/IEC MPEG-4 Overview

    • Adopted in 1999• ISO/IEC 14496-2 MPEG-4 Part 2• Based on H.263 as baseline• Optimized for internet streaming

    applications• Same picture sizes as MPEG-2 but also

    smaller sub-QCIF sizes

    ITU-T H.264

    • Adopted in 2003• ITU-T H.264• Also called ISO/IEC 14496-10 – MPEG-4 Part 10, Advanced Video

    Coding• H.264 is the most widely used codec on earth, even surpassing

    broadcast MPEG-2, simply due to the power of the internet. It is used by Youtube, and every other video provider of note.

    • It is also the codec that drives Blu-ray. Most cable, satellite and television channels have moved to H.264.

    14

    MPEG-2 vs H.264

    15

    1616

    Algorithm Comparison

    MPEG-2 MPEG-4 H.264

    ITU-T H.265

    • Initially adopted 2013 and revised several times.• ISO/IEC 23008–2 MPEG-H Part 2, High Efficiency Video Coding

    (HEVC). • Up to 8K UHDTV (8192×4320 maximum) • Up to 12-bit color bit depth 4:4:4 and 4:2:2 chroma sub-sampling • Supports up to 300 fps • Data rates up to several GB/s • Special features to support high efficiency encoding of very high

    resolution video

    17

    Algorithm Comparison

    18

  • ITU/ISO/IEC New Work

    • Formed Joint Video Exploration Team (JVET) to investigate next generation video compression standard

    • Target release in 2020

    19

    Other Codecs

    • Microsoft– WMV7, WMV8, WMV9

    • Google– VC6, VP7, VP8, VP9

    • DIRAC• Alliance of Open Media (AOMedia)

    – AV120

    212121

    222222

    Contact Information

    747 Dresher RoadHORSHAM, PA 19044

    Ph: 215-657-5270FAX: 215-657-5273

    www.deltadigitalvideo.com

    22

    232323

    Backup Slides

    Overview of IRIG-210

    The Horace Standard

  • 252525

    IRIG 210

    • Developed in late 1980’s under SBIR program• Standardized by IRIG in 1993• Differential Pulse Code Modulation Algorithm (DPCM)• Monochrome, RS-170 video• Full, Fixed or variable frame rate

    – Field or frame skip modes• Typical Resolution

    – 640, 512, 450, 320, 256, 225, 160 or 128 pixels per line– 240 lines per field or 480 lines per frame

    262626

    Differential PCM (DPCM)• Video is encoded line by line

    – Line Synchronization code– Format Information– Line pixel data

    • PCM – 7 bit value for each pixel• DPCM – difference between the current pixel and the previous pixel

    – Three DPCM modes: Normal, High, 2-Bit– Adaptive on a line by line basis

    • Entropy Coding– Normal and High modes only– 1 to 8 bit codes based on previous and current DPCM code

    272727

    DPCM Modes

    • Normal– 3 bits per pixel (8 difference values)

    • 0, +3, -3, +8, -8, +20, -20, +/-40

    • High Level (course)– 3 bits per pixel (8 difference values)

    • 0, +4, -4, +10, -10, +25, -25, +/-50

    • 2-Bit Mode– 2 bits per pixel (4 difference values)

    • +4, -4, +26, -26

    282828

    Entropy Coding

    292929

    DPCM Example 1Input Pixel 50 45 48 43 40 44 47 51

    Difference +50 -5 +3 -5 -3 +4 +3 +4

    DPCM Code 8 2 2 3 3 2 2 2

    DPCM Jump +40 +3 +3 -3 -3 +3 +3 +3

    Output Pixel 40 43 46 43 40 43 46 49

    Error 10 2 2 0 0 1 1 2

    Output Bits 00000001 001 01 001 010001 01 01

    26 Bits for 8 Pixels = 3.25 Bits/Pixel

    303030

    DPCM Example 2

    Input Pixel 2 4 8 16 32 64 127 127

    Difference +2 +2 +4 +8 +16 +32 +64 +0

    DPCM Code 2 2 2 4 6 6 8 6

    DPCM Jump +3 +3 +3 +8 +20 +20 +40 +20

    Output Pixel 3 6 9 17 37 57 97 117

    Error 1 2 1 1 5 7 30 10

    Output Bits 001 01 01 0001 0001 1 000000010000001

    31 Bits for 8 Pixels = 3.875 Bits/Pixel

  • 313131

    DPCM Example 3

    Input Pixel 0 0 0 0 127 127 127 127

    Difference +0 +0 +0 +0 +127 +0 +0 +0

    DPCM Code 1 1 1 1 8 8 8 4

    DPCM Jump +0 +0 +0 +0 +40 +40 +40 +8

    Output Pixel 0 0 0 0 40 80 120 128

    Error 0 0 0 0 87 47 7 1

    Output Bits 0001 1 1 100000001

    00000001

    00000001

    00001

    36 Bits for 8 Pixels = 4.5 Bits/Pixel

    323232

    IRIG 210 Performance

    • Best possible performance– Black screen– 1-bit per pixel– 512x480 @ 30 fps: 7.7Mbps– 256x480 @ 30 fps: 4.0 Mbps

    • Typical performance– About 2.2-bits per pixel– 512x480 @ 30 fps: 16.5 Mbps– 256x480 @ 30 fps: 8.4 Mbps

    Overview of H.261 and H.263

    The original video conferencing codec

    343434

    ITU-T H.261 Format

    • Picture Structure– Picture [CIF: 2x6 GOBs, QCIF: 1x3 GOBs]

    • Group of Blocks (GOB) [11x3 macro blocks]– MacroBlock [ 4 Y, 1 Cr, 1 Cb blocks]

    » Block [ 8x8 pixels]

    • Forward Error Correction– BCH (511,493)– 512 bit frame with sync bit and fill indicator

    353535

    ITU-T H.261 Intra-frame Coding

    • Full Intra-frame coded on startup• After that all frames Inter-frame coded• Intra Refresh

    – Intra code individual blocks– X macroblocks per frame– To correct transmission and roundoff errors

    • Full Intra-frame coded or on command– Usually requested by decoder

    363636

    ITU-T H.261 Inter-frame Coding

    • Prediction based on previous frame with motion compensation

    • Motion vector applies to macroblock• Full pixel motion vector resolution• +/- 15 pixel motion vector range

  • JPEG and JPEG2000

    • JPEG (Joint Photographic Experts Group) became a popular codec of choice for still image compression

    • Intra Frame compression algorithms – compress each frame independently without regard to prior or later frames

    37

    3838

    ISO/IEC JPEG-2000 Features

    • Superior compression performance compares to JPEG.• Multiple resolution representation.• Progressive transmission by quality and resolution

    accuracy.• Random code-stream access and processing, also

    referred as Region Of Interest (ROI).• Error resilience.• No blocking artifacts• Supports very large format images and tiling

    Overview of MPEG-1, 2, 4

    Entertainment quality video

    404040

    MPEG-1 Overview

    • ISO/IEC 11172 MPEG-1 Standard (1992)• Target bit-rate about 1.5 Mbit/s• Typical image format SIF (352x240), no

    interlace• Frame rate 24 ... 30 fps• Main application: video storage for

    multimedia (e.g., on CD-ROM)

    414141

    MPEG Components

    424242

    Discrete Cosine Transform

    • 4 x 4 Block size– DCT requires O(n^2) operations– 8x8: 4096 operations– 4x4: 256 operations (x4=1024)

    • 2 x 2 block size for Chrominance• 4 x 4 block size for DC coefficients of 16x16 group• Integer implementation

    – 16 bit integer math– No floating point math

    • No multiplication or division– Only add, subtract, shift

    • Integrated with quantizer– Scaling integrated with quantizer so it is only done once.

  • 434343

    More Motion Estimation

    • Forward and Backward Reference Frames (Blocks)

    • Multiple Reference Frames (Blocks)– Average of blocks in multiple different frames– Weighted factor for each reference frame

    • 1/4 pixel accuracy/resolution

    444444

    Improved De-blocking Filter

    • In-loop filtering, not post filtering• Improves the quality of the predicted image as

    well as the resultant image• On edge level, filtering strength is dependent on

    the code mode, motion vector, and values of residuals

    • On sample level, quantizer dependent thresholds can turn off filtering for any individual sample

    454545

    Entropy Coding

    • Exponential Golomb code– Header and other information

    • Context-Adaptive Variable Length Coding (CAVLC)– Quantized transform coefficients– Run-length coding– Runs of zeros, runs of ones

    • Context-Adaptive Binary Arithmetic Coding (CABAC)– Quantized transform coefficients– 5-15% improvement over CAVLC, but more complex

    464646

    Exponential Golomb Codes

    Bit string form Range of Symbols (#)1 0 (1)

    0 1 x0 1-2 (2)0 0 1 x1 x0 3-6 (4)

    0 0 0 1 x2 x1 x0 7-14 (8)0 0 0 0 1 x3 x2 x1 x0 15-30 (16)

    0 0 0 0 0 1 x4 x3 x2 x1 x0 31-62 (32)… …

    474747

    Slices

    • Groups of one or more Macro Blocks (MB)– Horizontal, vertical, rectangular, dispersive– Independently decodable– Limits error propagation within a picture

    • I Slice– Only intra coded MB

    • P Slice– Forward predicted and intra coded MB

    • B Slice– Backward, forward, multiple reference and intra coded MBs

    • SI, SP Slice– For bitstream switching

    • Redundant Slices– For error resilience

    • Arbitrary Slice Order

    Parts and Layers

    • MPEG standards have sub-divisions, called Parts. • Traditionally, Part 1 is always for the ‘System’ (file or

    stream format). Part 2 is for video, and Part 3 is for audio.

    • Parts are further sub-divided into Layers. • Audio has three or more layers, called Layer I, Layer II

    and Layer III and so on.• MPEG-1 Part 3 Layer III is called MP3.

    48

  • Profiles and Levels

    • Beginning with MPEG-2, is the concept of Profiles and Levels – in addition to Parts and Layers.

    • Profiles are sets of capabilities within the specification to be used for specific applications.

    • Levels specify a range of parameters used within a profile.

    • The algorithms are so complex and address so many different applications that it does not make sense for an encoder or software to comply with all of it.

    49

    505050

    ISO/IEC MPEG-2 Overview

    • Part 1, Systems defined two major classifications: Program Stream and Transport Stream

    • Part 2, Video is also called H.262. It had additional support for interlacing and 4:2:2

    • Part 3, Audio incorporated 5.1 channels

    515151

    ISO/IEC MPEG-2 Features

    • More input flexibility– Interlaced and progressive pictures– Field and frame coding– Wide range of picture resolutions

    • Motion estimation– Half pixel motion vectors– 16x16 or 16x8 motion blocks– Estimation across fields (Dual prime)

    • Non-linear quantization• Improved VLC • Various scalability tools

    525252

    Frame/Field Coding

    535353

    ISO/IEC MPEG-2 Profiles

    • Simple– 4:2:0 sampling, I/P pictures only, no scalable coding

    • Main– As above, plus B pictures

    • SNR– As above, plus SNR scalability

    • Spatial– As above, plus spatial scalablity

    • High– As above, plus 4:2:2 sampling, 11 bit precision

    • 422– As above, no scalability, higher data rate

    545454

    ISO/IEC MPEG-2 Levels

    • Low– 352 x 288 luminance samples, 30 Hz

    • Main– 720 x 576 luminance samples, 30 Hz

    • High-1440– 1440 x 1152 luminance samples, 60 Hz

    • High– 1920 x 1152 luminance samples, 60 Hz

  • 555555

    ISO/IEC MPEG-4 Features

    • Adds object based coding• Natural and synthetic objects• Shape coding

    – Identifies objects and background– Codes them separately

    • Motion Coding– Accounts for object motion– 1/4 Pixel motion vectors, 8x8 motion blocks– Global motion vectors

    • Texture Coding– Accounts for object color and texture

    565656

    ISO/IEC MPEG-4 Profiles and Levels

    • Profiles– Version 1: Simple, Simple Scalable, Core, Main, N-bit, Scaleable

    Texture, Simple Face Animation, Basic Animated Texture, and Hybrid.

    – Version 2: Core Scalable, Advanced Core, Advanced Coding Efficiency, Advanced Real Time Simple, Advanced Scaleable Texture, and Simple FBA.

    – 1st Extension: Simple Studio and Core Studio.– 2nd Extension: Advanced Simple and Fine Granularity Scalability.

    • Levels– Various levels constraining profile parameters

    575757

    Network Abstraction Layer (NAL)

    • Formats the compressed data and provides header information in a manner appropriate for conveyance on a variety of communication channels or storage media

    • NAL Unit Stream– Groups data by slice– Used for packet transmission or storage applications

    • Byte Stream– NAL Unit plus Start Code Prefix and fill Suffix– Used for transmission applications

    585858

    H.264 Levels

    • Restricts range of parameter values• Bit rate• Frame size• Motion vector range• Slice size

    595959

    H.264 Summary

    • 50% coding gain over MPEG-2 and H.263 Baseline profile

    • 25% coding gain over H.263 CHC profile• Encoder complexity 3x that of older

    algorithms• Decoder complexity 2x that of older

    algorithms

    ITU-T H.264 Features

    • 40-50% the bit rate reduction at the same visual quality compared to MPEG-2

    • Hybrid spatial-temporal prediction model• Flexible partition of Macro Block (MB) , sub MB for motion

    estimation• Intra Prediction (extrapolate already decoded neighboring pixels for

    prediction) with 9 directional modes • Entropy coding is CABAC and CAVLC• Support Up to 4K (4,096×2,304)• Supports up to 59.94 fps• 21 profiles ; 17 levels60

  • 616161

    Why is it so good?

    • Intra-frame prediction• Integer-based DCT algorithm• Improved de-blocking filter• Flexible block sizes from 4x4 to 16x16 for

    improved motion estimation• Improved entropy coding (CABAC)• Multiple Reference Frames

    Overview of H.264

    (a.k.a. MPEG-4 Part 10)(a.k.a. AVC)

    636363

    Key H.264 Profiles

    • Baseline Profile– All inter and intra coding tools (I and P slices)– CAVLC– Error concealment tools

    • Main Profile– Same as Baseline except no ASO, FMO, RS– CABAC– Multiple Reference frames, weighted prediction (B slices)– Macro Block Adaptive Frame/Field (MBAFF) coding– Adaptive Block-size Transform (ABT)

    • Constrained Baseline Profile• High Profile (HiP)

    – The primary profile for broadcast and disc storage applications, particularly for high-definition television applications (for example, this is the profile adopted by the Blu-ray Disc storage format and the DVB HDTV broadcast service).

    6464

    More H.264 Profiles

    • Extended Profile– Data Partitioning– Switching I and P slices

    • High 10 Profile (Hi10P)– Going beyond typical mainstream consumer product capabilities, this profile

    builds on top of the High Profile, adding support for up to 10 bits per sample of decoded picture precision.

    • High 4:2:2 Profile (Hi422P)– Primarily targeting professional applications that use interlaced video, this profile

    builds on top of the High 10 Profile, adding support for the 4:2:2 chromasubsampling format while using up to 10 bits per sample of decoded picture precision.

    • High 4:4:4 Predictive Profile (Hi444PP)– This profile builds on top of the High 4:2:2 Profile, supporting up to 4:4:4 chroma

    sampling, up to 14 bits per sample, and additionally supporting efficient lossless region coding and the coding of each picture as three separate color planes.

    6565

    Even More H.264 Profiles

    • High 10 Intra Profile– The High 10 Profile constrained to all-Intra use.

    • High 4:2:2 Intra Profile– The High 4:2:2 Profile constrained to all-Intra use.

    • High 4:4:4 Intra Profile– The High 4:4:4 Profile constrained to all-Intra use.

    • CAVLC 4:4:4 Intra Profile– The High 4:4:4 Profile constrained to all-Intra use and to CAVLC entropy coding (i.e., not supporting CABAC).

    • Scalable Baseline Profile– Primarily targeting video conferencing, mobile, and surveillance applications, this profile builds on top of a constrained version of

    the H.264/AVC Baseline profile to which the base layer (a subset of the bitstream) must conform. For the scalability tools, a subset of the available tools is enabled.

    • Scalable High Profile– Primarily targeting broadcast and streaming applications, this profile builds on top of the H.264/AVC High Profile to which the base

    layer must conform.• Scalable High Intra Profile

    – Primarily targeting production applications, this profile is the Scalable High Profile constrained to all-Intra use.• Stereo High Profile

    – This profile targets two-view stereoscopic 3D video and combines the tools of the High profile with the inter-view prediction capabilities of the MVC extension.

    • Multiview High Profile– This profile supports two or more views using both inter-picture (temporal) and MVC inter-view prediction, but does not support

    field pictures and macroblock-adaptive frame-field coding.

    Overview of JPEG-2000

    Wavelet based image compression

  • 6767

    JPEG-2000 System

    6868

    Discrete Wavelet Transform

    • Captures both frequency and location information

    • Effectively a series of filters and down sampling stages

    • Reversible Transform for lossless compression• Irreversible Transform for lossy compression

    6969

    Discrete Wavelet Transform

    7070

    2D Discrete Wavelet Transform

    7171

    3-Stage DWT

    Stage 1 Stage 2 Stage 3

    7272

    Quantization

  • 73

    Wavelet Encoding Methods

    • Embedded Zero-Tree Wavelet (EZW)– The bitstream is embedded and the coefficients are ordered in significance and

    precision, so that it can be truncated according to the bit-rate requirements of the channel.

    – It efficiently utilizes the self-similarity between the subbands of similar orientation and achieves significant data reduction.

    – However, the EZW algorithm is not exactly optimal.• Set Partitioning in Hierarchical Trees (SPIHT)

    – Improved performance over EZW, because of its ability to exploit the grouping of insignificant coefficients.

    • Embedded Block Coding with Optimized Truncation of bit-stream (EBCOT)– Offers both resolution scalability and SNR scalability. – Because of its advantages, the EBCOT algorithm has been accepted

    incorporated within the most recent still image compression standard JPEG-2000.

    7474

    Embedded Block Coding

    • Embedded Block Coding with Optimized Truncation (EBCOT)• Divided into two layers called Tiers.

    – Tier 1 is responsible for source modeling and entropy coding– Tier 2 generates the output stream

    • Wavelet sub-bands are partitioned into small code-blocks (to not be confused with tiles), typically 64x64

    • Code blocks are in turn are separated into bit-planes. (i.e. coefficient bits of the same order are coded together)

    • Bit-planes are further partitioned into three coding passes. Each coding pass constitutes an atomic code unit, called chunk.

    • Bit-planes are coded using Arithmetic coding (MQ-coder).

    7575

    Embedded Block Coding

    767676

    MPEG Components

    777777

    Levels of Video Compression

    • Uncompressed– 166 mbps

    • True Lossless Compression– 45 mbps (4)

    • Visually Lossless– 3-10 mbps (16-55)

    • High Quality Videoconferencing– 384 Kbps to 1.5 mbps (110-432)

    • Acceptable Quality Videoconferencing– 112 Kbps to 256 Kbps (648-1482)

    • Videophone– 20 Kbps to 33.6 Kbps (4940-8300)

    787878

    Comparison of Video Coding Standards

    H.261 H.263 MPEG1 MPEG2

    General Standard Structure Narrow Profile Narrow Profile Generic Tool Kit

    Picture Format176 x 144 (mandatory)

    352 x 28 (optional)SQCIF, QCIF, CIF, 4CIF, 16CIF 360 x 240 Field or Frame

    Quantization Precision of Motion Vectors One Pixel One Half Pixel One Half or One Pixel

    B-Frames/PB Frame None PB FramesB Frames

    An Available Tool

    Intraframe Coding Usually Distributed Flexible Full Frame is Mandatory

    Color Coding 4:2:0 4:2:0 4:4:4, 4:2:2, 4:2:0

    Picture Structure Group of Blocks GOB Slice

    Dual Prime (a special motion compensation mode)

    No No No Yes

    Nominal Bitrate 56 Kbps to 1,936 Kbps Low Bit Rate 1.5 mbps 4 – 20 mbps

    Applications

    Interactive Audiovisual Services

    -Videophone-- Videoteleconferencing

    VTC/Videophone via PSTN/Mobile Network - VCR

    - Broadcast TV- + Contribution

    + Distribution- DBS- HDTV

  • 797979

    Video Coding StandardsITU ISO

    H.261

    S 1990S CIF 352 x 288 pixelsS VHS qualityS Videoconferencing/ Videophone

    (VC/VPO) via N-ISDN; 56 Kbps- 1.920 mbps

    MPEG1S 1993S SIF 352 x 240 pixelsS VHS qualityS Star application - CDROM - 1.5 bps

    H.262 S 1995S VC/VP via B-ISDN; 2 - 20 mbps MPEG2S 1993S SIF 352 x 240 pixelsS VHS qualityS Star application - CDROM - 1.5 mbps

    H.263S March 1996S SQCIF, QCIF, CIF, 4CIF, 16CIFS H.324, H.323, H.324MS Higher quality than H.261

    MPEG4- Video

    MPEG-4 will be interoperable with baseline H.263

    H.263+S January 1998S 12 optional H.263 ExtensionS New FunctionalitiesS Improved quality, error robustness

    Version 1 - 12/98Version 2 - 12/99 S Object based coding

    H.263++ SFuture options frame-based

    H.264S March 2003S Twice the compression of MPEG-2S More efficient, better tools

    MPEG4 Part 10

    -Integer DCT- 1/16 pixel motion estimation-Variable motion block shape

    808080

    Native Video ResolutionsTYPE NAME ACTIVE

    PIXELSSTRUCTURE FRAME RATES

    SD NTSC – 480i 720 x 480 Interlaced 30 Fps

    PAL – 576i 720 x 576 Interlaced 25 Fps

    ED NTSC – 480p 720 x 480 Progressive 30 Fps, 60 Fps

    PAL – 576p 720 x 576 Progressive 25 Fps, 50 Fps

    HD 720p 1280 x 720 Progressive 24, 25, 30, 48, 50, 60 FPS

    1080i 1920 x 1080 Interlaced 24, 25, 30 FPS

    1080p 1920 x 1080 Progressive 24, 25, 30, 48, 50, 60 FPS

    818181

    Common Scaled Video Resolutions

    SubQCIF 128 x 96 single fieldQCIF 176 x 144 single fieldCIF 352 x 288 single field

    2xCIF 704 x 288 single field4xCIF 704 x 576 pseudo progressive

    16xCIF 1408 x 1152 pseudo progressiveSIF 360 x 240 single field

    2xSIF 720 x 240 single field½ D1 360 x 480 interlaced

    D1 720 x 480 interlaced

    ITU-T H.265 Features

    • 40-50% of the bit rate at the same visual quality compared to H.264• Enhanced Hybrid spatial-temporal prediction model• Flexible partitioning, introduces Coding Tree Units (Coding,

    Prediction and Transform Units -CU, PU, TU) and larger block structure (64x64) with more variable sub partition structures

    • 35 directional modes for intra prediction• Superior parallel processing architecture, enhancements in multi-

    view coding extension• Up to 8K UHDTV (8192×4320)• 3 approved profiles, draft for additional 5 ; 13 levels

    82

    H.265

    • Replaces 16x16 Macroblocks with 64x64 Coding Tree Units (CTU)

    • Increases Intra Prediction modes from 9 to 35• Adds 16x16 and 32x32 to the existing 8x8 and

    4x4 Integer DCT Transform• Decode independent Tiles for increased

    parallelism

    83

  • An Introduction to Audio Compression

    AGENDA

    • Audio Codecs• Types of Audio Codecs• Waveform Coders• Source Coders• Silence Compression• Echo Cancellation

    2

    Classes of Audio Codecs

    • Speech Codecs– Optimized for human speech– Single channel– Low sample rates– Limited audio bandwidth

    • Audio Codecs– Optimized for full fidelity audio (music, sound effects, etc)– Multiple channels– High sample rates– High audio bandwidth

    3

    Speech Codecs

    AGC and

    FilteringSampling CodingAnalog Voice

    Coded Voice

    4

    Types of Speech Codecs

    • Waveform Coding– Speech waveform is digitized, coded, and

    transmitted.• Source Coding

    – Speech source is modeled by an excitation and a filter. Characteristics of the excitation and filter are coded and transmitted.

    5

    Waveform Codecs

    • G.711 A-Law and u-Law PCM• G.722 ADPCM

    6

  • G.711 PCM

    • A-Law or u-Law Companding• 8 Ksps = 3.5 KHz bandwidth• Converts 16-bit sample to 13-bits by

    truncation.• Converts 13-bit sample to 8-bits by

    logarithmic translation.• Preserves dynamic range• Compressed data rate: 64kbps7

    A-Law Companding

    -128

    -112

    -96

    -80

    -64

    -48

    -32

    -160

    16

    32

    48

    64

    80

    96

    112

    128

    -0,5x40960

    0,5x4096

    -4096 +4096

    Compressed Samples

    Linear Samples

    8

    G.722 SubBand ADPCM

    • Adaptive Differential PCM• 16 Ksps = 7 KHz Bandwidth• Filters signal into a high band and a low

    band.• Applies ADPCM to each band.• Higher bitrate allocated to low band• Compressed data rate: 64 Kbps9

    ADPCM

    • Previous samples form a prediction of the next samples.

    • Take the difference between the prediction and the actual speech.

    • Code the difference.• Adapt the prediction as the speech

    changes.10

    Block Diagram of the G.722 Encoder

    11

    Source Codecs

    • G.728 LD-CELP• G.729 ACELP• G.723.1 MPE/MLQ and ACELP

    12

  • Linear Prediction Coding

    • Model human speech by using an excitation signal passed through a filter.

    • Match the model to the input speech using analysis by synthesis.

    • Only transmit info about the excitation signal and characteristics of the filter.

    13

    Linear Prediction Coding

    14

    Linear Prediction Diagram

    15

    G.728 LD-CELP

    • Low Delay – Code Excited Linear Prediction

    • 8 Ksps = 3.5 KHz Bandwidth• Find codebook excitation and filter that

    best matches signal.• Code and transmit codebook index, gain,

    and filter coefficients.• Compressed data rate: 16 Kbps16

    G.729 CS-ACELP

    • Conjugate-Structure Algebraic Code Excited Linear Predictor

    • 8 Ksps = 3.5 KHz bandwidth• Uses Conjugate Structured codebook

    which simplifies the signal match search• Compressed data rate: 8 Kbps

    17

    G.723.1 MP-MLQ/ACELP

    • Multipulse – Maximum Likelihood Quantization

    • Algebraic Code Excited Linear Prediction• 8 Ksps = 3.5 KHz bandwidth• MP-MLQ = 6.3 Kbps• ACELP = 5.3 Kbps

    18

  • Silence Compression

    • Used in G.723.1 and G.729• Reduces bitrate during silence periods• Voice Activity Detection (VAD)

    – Spectral analysis: presence or absence of speech– Operates on 30 ms speech frames which are

    encoded or filled with comfort noise• Comfort Noise Generator (CNG)

    – Generates artificial background noise that matches the actual background noise

    19

    Acoustic Echo Cancellation

    20

    Types of Audio Codecs

    • Sub-band Coding– Audio signal is applied to a series of filter banks which are then

    entropy encoded.• Modified Discrete Cosine Transform (MDCT)

    – DCT performed on overlapping blocks data where the last half of the previous block is the first half of the next block.

    • Sub-band MDCT– Audio signal is prefiltered with a 32-band polyphase quadrature

    filter (PQF) bank before the MDCT is applied.

    21

    Audio Codec

    22

    AGC and

    FilteringSampling CodingAnalog Audio

    Coded Audio

    Frequency Domain

    Conversion

    Perceptual Quantization

    Audio Channels

    • Monaural– Single channel

    • Stereo– Two channels

    • Surround Sound– 5.1 (5 channels plus 1 Low Frequency

    Enhancement)23

    Perceptual Quantization

    • Removes information that the human auditory system will not be able to easily perceive.

    • To choose which information to remove, the audio signal is analyzed according to a psychoacoustic model, which takes into account the parameters of the human auditory system.

    • Research into psychoacoustics has shown that if there is a strong signal on a certain frequency, then weaker signals at frequencies close to the strong signal's frequency cannot be perceived by the human auditory system. This is called frequency masking.

    • By ignoring information at frequencies that are deemed to be imperceptible, more data can be allocated to the reproduction of perceptible frequencies.

    24

  • MPEG-1/Layer 2 (MP2)

    • Primarily used for Broadcast Audio• Sub-band

    – Splits the input audio signal into 32 sub-bands using sub-band filter banks, and if the audio in a sub-band is deemed to be imperceptible then that sub-band is not transmitted.

    • MPEG-1 Audio Layer II– Sampling rates: 32, 44.1 and 48 kHz– Bit rates: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s– Up to two audio channels

    • MPEG-2 Audio Layer II– Sampling rates: 16, 22.05 and 24 kHz– Bit rates: 8, 16, 24, 40 and 144 kbit/s– Multichannel support - up to 5 full range audio channels and an LFE-channel

    (Low Frequency Enhancement channel)25

    MPEG-1/Layer 3 (MP3)

    • Primarily used for PC and Internet audio• MDCT/Hybrid Sub-band

    – Transforms the input audio signal into 576 frequency components in the frequency domain using the MDCT allowing finer application of the perceptual filtering

    • MPEG-1 Audio Layer III– Bitrate: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s– Sample Rate:32, 44.1 and 48 kHz.

    • MPEG-2 Audio Layer III– Bitrate: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kbit/s – Sample Rate: 16, 22.05 and 24 kHz

    • MPEG-2.5 Audio Layer III – Bitrate: 8, 16, 24, 32, 40, 48, 56 and 64 kbit/s – Sample Rate: 8, 11.025, and 12 kHz26

    Advanced Audio Coding (AAC)

    • Primarily used for YouTube, iPhone, iTunes, Nintendo• Improvement over MP3• MDCT/Hybrid Sub-band

    – The signal is converted from time-domain to frequency-domain using forward modified discrete cosine transform (MDCT).

    – The frequency domain signal is quantized based on a psychoacoustic model and encoded.

    • Sample rates: from 8 to 96 kHz• Bit Rate: Arbitrary (2kbps to 320kbps)• Up to 48 channels

    27

    282828

    292929

    Contact Information

    747 Dresher RoadHORSHAM, PA 19044

    Ph: 215-657-5270FAX: 215-657-5273

    www.deltadigitalvideo.com

    29

    303030

    Backup Slides

  • Typical Audio Quality Problems

    • Too many errors– Packet loss

    • Too much delay– Congestion– Too many router hops– QOS parameters

    31

    Audio Quality• Packet Delay

    – Time for voice packets to transit from the sender to the receiver– If too large, makes conversation difficult

    • Packet Jitter– Variation in packet delay– If too large, requires larger buffering, increasing delay or causing buffer

    underflow or overflow• Packet Loss

    – Packets are dropped due to congestion, poor cabling, equipment malfunction, duplex mismatch, bit errors, CRC errors

    – Since there is no retransmission, results in lost information– Clicks, chirps, hum in audio

    32

    Audio Quality• One Way Delay

    – Good (0ms-150ms)– Acceptable (150ms-300ms)– Unacceptable (> 300ms)

    • Jitter– Good (0-20ms)– Acceptable (20ms-50ms)– Unacceptable (>50ms)

    • Packet Loss– Good(0%-0.5%)– Acceptable (0.5%-1.5%)– Unacceptable (>1.5%) 33

    Blank Page