cse 489-02 & cse 589-02 multimedia processing lecture 11 video coding spring 2009 new mexico...

CSE 489-02 & CSE 589-02 Multimedia Processing

Lecture 11 Video Coding

Spring 2009

New Mexico Tech

04/19/23 1

History

H.264/AVC

04/19/23 2

MC-DCT Coding Framework

Motion estimation/compensation based on previously decoded frames

Block-translation motion model

Inter-coding: DCT-based coding of prediction error (residue) Intra-coding: If motion estimation fails or synchronization is

desired, macro-block is encoded in intra-mode

Most international video coding standards are based on this coding framework

Video teleconferencing: H.261, H.263, H.263++, H.264 Video archive & play-back: MPEG-1, MPEG-2 (in DVDs), MPEG-404/19/23 3

Hybrid MC-DCT Encoder

Entropy Decoding,Inverse Q,Inverse Transform

Motion Comp.Predictor

Transform,Quantization,Entropy Coding

MotionEstimation

Frame Buffer (Delay)

MotionCompensatedPrediction

InputMacro-Block Encoded Residual

(To Channel)

Decoded InputMacro-Block (To Display)

Motion Vector andBlock Mode Data(Side-Info, To Channel)

Decoder

04/19/23 4

Inter and Intra Coding Intra

MB is encoded as is without motion compensation

DCT followed by Q, zig-zag, run-length, Huffman

Inter Block-matching motion estimation Predictive motion residue from best-match

block is DCT encoded (similarly to intra-mode) Motion vector is differentially encoded

04/19/23 5

Intra-Coding Modeinput MB

1Q

Q EDCT

IDCT

to motion compensated frame

to bit-stream

Encoder

bit-stream to display frame1E 1Q IDCT

Decoder04/19/23 6

Inter-Coding Mode

input MB

1Q

Q EDCT

IDCT

to bit-stream

Encoder

nx nr

nr̂

DMC

ME

1ˆ nxMC

1ˆ nxMC

nx̂ 1ˆ nx

nx reference frame

04/19/23 7

8

Video Sequence and Picture

Intra Picture (I-Picture) Encoded without referencing others All MBs are intra coded

Inter Picture (P-Picture, B-Picture) Encoded by referencing other pictures Some MBs are intra coded, and some are inter coded

Intra 0 Inter 1 Inter 2 Inter 3 Inter 4 Inter 5

04/19/23

9

Group of Pictures

Group of Pictures (GOP)

I B B P B B P … B B I B B P …

Encoding order:

Frame order: 0 1 2 3 4 5

0 2 3 1 5 6

6

4

Video stream

GOP GOP GOP…

04/19/23

10

Coding of I-Slice

Original block Transformed block Quantization matrix

15 0 -2 -1 -1 -1 0 …Bit-stream

Zig-zag scanEntropy coding

DCT

04/19/23

11

Coding of P-Slice

=

=

+Motion Vectors

Motion Estimation

Residual

Motion Compensation

Original current frame

Reconstructed reference frame

Frame buffer

-

04/19/23

Motion Estimation in H.261 Macro-block

Luminance: 16x16, four 8x8 blocks

Chrominance: two 8x8 blocks

Motion estimation only performed for luminance component

Motion vector range [ -15, 15]

Y Y

Y Y Cr Cb

8

8

15

15

15 15

Search Area in Reference Frame

MB

04/19/23 12

Coding of Motion Vectors MV has range [-15, 15] Integer pixel ME search only Motion vectors are differentially & separably

encoded

11-bit VLC for MVD Example

]1[][

]1[][

nMVnMVMVD

nMVnMVMVD

yyy

xxx

MV = 2 2 3 5 3 1 -1…MVD = 0 1 2 -2 -2 -2…

Binary: 1 010 0010 0011 0011 0011…04/19/23 13

Inter/Intra Switching Based on energy of prediction error

High energy: scene change, occlusions, uncovered areas… use intra mode

Low energy: stationary background, translational motion … use inter mode

64

64

MSE

VAR

INTRA

INTER

MB

dyydxxryxcMSE 2],[],[256

1

MB

cyxcVAR 2],[256

1

04/19/23 14

Loop Filter Optional Can be turned on or off for each block, usually go

together with MC Advantage

Decreases prediction error by smoothing the prediction frame

Reduces high-frequency artifacts like mosquito effects

Disadvantage Increases complexity & overhead

04/19/23 15

Quantization Uniform mid-rise quantizer for intra DC coefficients Uniform mid-tread quantizer with double dead zone for

inter DC and all AC coefficients

XX

YY

-1-1

-2-2

QQ 2Q2Q

-Q-Q-2Q-2Q11

22

00

= X= X̂̂

XX

YY

-1-1

-2-2

QQ 2Q2Q

-Q-Q-2Q-2Q

11

22

00

= X= X̂̂

For intra DC For inter DC and all AC04/19/23 16

H.263 Standardization effort started Nov 1993 Aim

low bit-rate video communications, less than 64 kbps target PSTN and mobile network: 10-32 kbps

Near-term H.263 and H.263+: established late 1997

Long-term H.26L, H.264: still under investigation

Main properties H.261 with many MPEG features optimized for low bit rates Performance: 3-4 dB improvements over H.261 at less than

64 kbps; 30% bit rate saving over MPEG-1

04/19/23 17

MPEG Coding and communications of moving pictures

and associated audio for digital storage and archival

MPEG: Moving Picture Expert Group MPEG family

MPEG-1, Nov 1992 MPEG-2, Nov 1994 MPEG-4, Oct 1998 MPEG-7, ongoing work

Main features of the MPEG video family Bi-directional MEMC I-frame, P-frame, B-frame Structure: Group of Pictures (GOP), picture, slice, macro-

block Coding decisions

04/19/23 18

MPEG Goals and Applications MPEG-1

Optimized for applications that support a continuous transfer bit rate of about 1.5 Mbps (example, CD-ROM)

Target 1.2 Mbps for video and 250-300 kbps for audio, around analog VHS quality

Does not support interlaced sources Main target source: SIF YCrCb 4:2:0 360 x 240 x 30 fps VCD

MPEG-2 The most commercially successful international coding

standard Wide range of bit rates: 4 – 80 Mbps; optimized for 4 Mbps Target high-resolution, high-quality video broadcast & playback DVD, Digital TV: DirecTV, HDTV…

04/19/23 19

Requirements Coding of generic video at around 1.5 Mbps at

reasonable quality (VHS) Random access capability, frequent access point Fast forward and fast rewind capability Audio-video synchronization during play and

access Simple decoder Flexibility of data format Certain degree of robustness to communication

errors Real-time encoder possibility

04/19/23 20

From H.261 to MPEG-1 There are a few new features in MPEG-1

comparing to the pioneering H.261 codec Flexible data sizes and frame rates More flexible slice structure to replace the fixed

GOB structure Data structure: introducing Group of Picture (GOP)

allowing frequent access points Bi-directional motion compensation, B-frames Half-pixel motion compensation More finely tuned VLCs for different purposes Quantization table (like JPEG) replaces single Q

step size

04/19/23 21

Bidirectional MC Properties Advantage

Higher coding efficiency, frame rate can be increased significantly with few bits

More accurate motion estimation & compensation

No error propagation Disadvantage

More memory buffer for frame storage (minimum of 3)

More end-to-end delay

04/19/23 22

H.264/AVC History In the early 1990’s, the first video compression

standards were introduced: H.261 (1990) and H.263 (1995) from ITU MPEG-1 (1993) and MPEG-2 (1996) from ISO

Since then, the technology has advanced rapidly H.263 was followed by H.263+, H.263++, H.26L MPEG-1/2 followed by MPEG-4 visual But industry and research coders are still way ahead

H.264/AVC is a joint project of ITU and ISO, to create an up-to-date standard.

04/19/23 23

Scope and Context Aimed at providing high-quality compression for

various services: IP streaming media (50-1500 kbps) SDTV and HDTV Broadcast and video-on-demand (1 - 8+ Mbps) DVD Conversational services (<1 Mbps, low latency)

Standard defines: Decoder functionality (but not encoder) File and stream structure

Final results: 2-fold improvement in compression Same fidelity, half the size --- Compared to H.263 and MPEG-2

04/19/23 24

Video Compression Motion compensation / prediction

Described current frame based on previous frame Output description + residual image Predicted frames are called “inter-frames”. Some frames (intra-frames) are encoded without prediction, as

natural images.

Image transform Concentrate image energy in relatively few numeric

coefficients

Lossy coding Compress coefficient values in a lossy manner Try to keep most important information

04/19/23 25

The H.263 Standard Coder

Motion Compensat

ion

Lossy Coding

compressed videooriginal video

Image Transfor

m

04/19/23 26


Motion Compensat

ion

Lossy Coding


Image Transfor

m

H.263 Motion Compensation• Image is divided into 16x16 macroblocks,

• Each macroblock is matched against nearby blocks in previous frame (called reference frame),

• “Nearby” = within 15-pixel horizontal/vertical range

• Half-pixel accuracy (with bilinear pixel interpolation)

• Best match is used to predict the macroblock,

• The relative displacement, or motion vector, is encoded and transmitted to decoder

• Prediction error for all blocks constitute the residual.

04/19/23 27

Motion Compensation Example

T=1 (reference) T=2 (current)04/19/23 28


Motion Compensat

ion

Lossy Coding


Image Transfor

m

H.263 Image Transform• Residual is divided into 8x8 blocks,

• 8x8 2-d Discrete Cosine Transform (DCT) is applied to each block independently

• DCT coefficients describe spatial frequencies in the block:

• High frequencies correspond to small features and texture

• Low frequencies correspond to larger features

• Lowest frequency coefficient, called DC, corresponds to the average intensity of the block

04/19/23 29

8x8 DCT Example

04/19/23 30

8x8 DCT Example

04/19/23 31

8x8 DCT Example

04/19/23 32


Motion Compensat

ion

Lossy Coding


Image Transfor

m

H.263 Lossy Coding• Transform coefficients are quantized:

• Some less-significant bits are dropped

• Only the remaining bits are encoded

• For inter-frames, all coefficients get the same number of bits, except for the DC which gets more.

• For intra-frames, lower-frequency coefficients get more bits

• To preserve larger features better

• The actual number of bits used depends on a quantization parameter (QP), whose value depends on the bit-allocation policy

• Finally, bits are encoded using entropy (lossless) code

• Traditionally Huffman-style code04/19/23 33

Changes in Motion Compensation Quarter-pixel accuracy

A gain of 1.5-2dB across the board over ½-pixel Variable block-size:

Every 16x16 macroblock can be subdivided Each sub-block gets predicted separately

Multiple and arbitrary reference frames Vs. only previous (H.263) or previous and next

(MPEG). Anti-aliasing sub-pixel interpolation

Removes some common artifacts in residual

04/19/23 34

Variable Block-Size MC Motivation: size of moving/stationary

objects is variable Many small blocks may take too many bits to

encode Few large blocks give lousy prediction

In H.264, each 16x16 macroblock may be: Kept whole, Divided horizontally (vertically) into two sub-

blocks of size 16x8 (8x16) Divided into 4 sub-blocks In the last case, the 4 sub-blocks may be divided

once more into 2 or 4 smaller blocks.04/19/23 35

H.264 Variable Block Sizes

04/19/23 36

Motion Scale Example

T=1 T=204/19/23 37


T=1 T=204/19/23 38


T=1 T=204/19/23 39

H.264 VBS Example

T=1 T=204/19/23 40

Arbitrary Reference Frames In H.263, the reference frame for prediction is

always the previous frame In MPEG and H.26L, some frames are predicted

from both the previous and the next frames (bi-prediction)

In H.264, any one frame may be used as reference: Encoder and decoder maintain synchronized buffers of

available frames (previously decoded) Reference frame is specified as index into this buffer

In bi-predictive mode, each macroblock may be: Predicted from one of the two references Predicted from both, using weighted mean of predictors

04/19/23 41

Intra Prediction Motivation: intra-frames are natural images, so

they exhibit strong spatial correlation Implemented to some extent in H.263++ and MPEG-4, but in

transform domain

Macroblocks in intra-coded frames are predicted based on previously-coded ones Above and/or to the left of the current block The macroblock may be divided into 16 4x4 sub-blocks which

are predicted in cascading fashion

An encoded parameter specifies which neighbors should be used to predict, and how

04/19/23 42

Intra-Prediction Example

04/19/23 43

Intra-Prediction ExampleVertical

04/19/23 44

Intra-Prediction ExampleIntra-Prediction ExampleHorizontalHorizontal

04/19/23 45

Intra-Prediction ExampleIntra-Prediction ExampleMain DiagonalMain Diagonal

04/19/23 46

H.264 Image Transform Motivation:

DCT requires real-number operations, which may cause inaccuracies in inversion

H.264 uses a very simple integer 4x4 transform A (pretty crude) approximation to 4x4 DCT Transform matrix contains only +/-1 and +/-2

Can be computed with only additions, subtractions, and shifts

Results show negligible loss in quality (~0.02dB)

04/19/23 47

04/19/23 48

Deblocking Filter

Non Deblocked Image Deblocked ImageCourtesy : Images from http://compression.ru/video/deblocking/

Entropy Coding Motivation: traditional coders use fixed,

variable-length codes Essentially Huffman-style codes Non-adaptive Can’t encode symbols with probability > 0.5

efficiently, since at least one bit required H.263 Annex E defines an arithmetic coder

Still non-adaptive Uses multiple non-binary alphabets, which

results in high computational complexity

04/19/23 49

Entropy Coding: CABAC Context-adaptive binary arithmetic coding

(CABAC) framework designed specifically for H.264

Binarization: all syntax symbols are translated to bit-strings

399 predefined context models, used in groups E.g. models 14-20 used to code macroblock

type for inter-frames The model to use next is selected based on

previously coded information (the context)

04/19/23 50

Comparison to MPEG-2, H.263, MPEG-4p2Comparison to MPEG-2, H.263, MPEG-4p2Tempete CIF 30Hz

25

26

27

28

29

3031

32

33

34

35

36

37

38

0 500 1000 1500 2000 2500 3000 3500

Bit-rate [kbit/s]

QualityY-PSNR [dB]

MPEG-2H.263

MPEG-4

JVT/H.264/AVC

Visual

cse 489-02 & cse 589-02 multimedia processing lecture 11 video coding spring 2009 new mexico...

Documents

coding of motion vectors

motion compensation

intramode motion vector

intra coding intra mb

inverse transform motion

translational motion

8x8 blocks motion estimation

prediction frame