cse 489-02 & cse 589-02 multimedia processing lecture 11 video coding spring 2009 new mexico...
TRANSCRIPT
CSE 489-02 & CSE 589-02 Multimedia Processing
Lecture 11 Video Coding
Spring 2009
New Mexico Tech
04/19/23 1
MC-DCT Coding Framework
Motion estimation/compensation based on previously decoded frames
Block-translation motion model
Inter-coding: DCT-based coding of prediction error (residue) Intra-coding: If motion estimation fails or synchronization is
desired, macro-block is encoded in intra-mode
Most international video coding standards are based on this coding framework
Video teleconferencing: H.261, H.263, H.263++, H.264 Video archive & play-back: MPEG-1, MPEG-2 (in DVDs), MPEG-404/19/23 3
Hybrid MC-DCT Encoder
Entropy Decoding,Inverse Q,Inverse Transform
Motion Comp.Predictor
Transform,Quantization,Entropy Coding
MotionEstimation
Frame Buffer (Delay)
MotionCompensatedPrediction
InputMacro-Block Encoded Residual
(To Channel)
Decoded InputMacro-Block (To Display)
Motion Vector andBlock Mode Data(Side-Info, To Channel)
Decoder
04/19/23 4
Inter and Intra Coding Intra
MB is encoded as is without motion compensation
DCT followed by Q, zig-zag, run-length, Huffman
Inter Block-matching motion estimation Predictive motion residue from best-match
block is DCT encoded (similarly to intra-mode) Motion vector is differentially encoded
04/19/23 5
Intra-Coding Modeinput MB
1Q
Q EDCT
IDCT
to motion compensated frame
to bit-stream
Encoder
bit-stream to display frame1E 1Q IDCT
Decoder04/19/23 6
Inter-Coding Mode
input MB
1Q
Q EDCT
IDCT
to bit-stream
Encoder
nx nr
nr̂
DMC
ME
1ˆ nxMC
1ˆ nxMC
nx̂ 1ˆ nx
nx reference frame
04/19/23 7
8
Video Sequence and Picture
Intra Picture (I-Picture) Encoded without referencing others All MBs are intra coded
Inter Picture (P-Picture, B-Picture) Encoded by referencing other pictures Some MBs are intra coded, and some are inter coded
Intra 0 Inter 1 Inter 2 Inter 3 Inter 4 Inter 5
04/19/23
9
Group of Pictures
Group of Pictures (GOP)
I B B P B B P … B B I B B P …
Encoding order:
Frame order: 0 1 2 3 4 5
0 2 3 1 5 6
6
4
Video stream
GOP GOP GOP…
04/19/23
10
Coding of I-Slice
Original block Transformed block Quantization matrix
15 0 -2 -1 -1 -1 0 …Bit-stream
Zig-zag scanEntropy coding
DCT
04/19/23
11
Coding of P-Slice
=
=
+Motion Vectors
Motion Estimation
Residual
Motion Compensation
Original current frame
Reconstructed reference frame
Frame buffer
-
04/19/23
Motion Estimation in H.261 Macro-block
Luminance: 16x16, four 8x8 blocks
Chrominance: two 8x8 blocks
Motion estimation only performed for luminance component
Motion vector range [ -15, 15]
Y Y
Y Y Cr Cb
8
8
15
15
15 15
Search Area in Reference Frame
MB
04/19/23 12
Coding of Motion Vectors MV has range [-15, 15] Integer pixel ME search only Motion vectors are differentially & separably
encoded
11-bit VLC for MVD Example
]1[][
]1[][
nMVnMVMVD
nMVnMVMVD
yyy
xxx
MV = 2 2 3 5 3 1 -1…MVD = 0 1 2 -2 -2 -2…
Binary: 1 010 0010 0011 0011 0011…04/19/23 13
Inter/Intra Switching Based on energy of prediction error
High energy: scene change, occlusions, uncovered areas… use intra mode
Low energy: stationary background, translational motion … use inter mode
64
64
MSE
VAR
INTRA
INTER
MB
dyydxxryxcMSE 2],[],[256
1
MB
cyxcVAR 2],[256
1
04/19/23 14
Loop Filter Optional Can be turned on or off for each block, usually go
together with MC Advantage
Decreases prediction error by smoothing the prediction frame
Reduces high-frequency artifacts like mosquito effects
Disadvantage Increases complexity & overhead
04/19/23 15
Quantization Uniform mid-rise quantizer for intra DC coefficients Uniform mid-tread quantizer with double dead zone for
inter DC and all AC coefficients
XX
YY
-1-1
-2-2
QQ 2Q2Q
-Q-Q-2Q-2Q11
22
00
= X= X̂̂
XX
YY
-1-1
-2-2
QQ 2Q2Q
-Q-Q-2Q-2Q
11
22
00
= X= X̂̂
For intra DC For inter DC and all AC04/19/23 16
H.263 Standardization effort started Nov 1993 Aim
low bit-rate video communications, less than 64 kbps target PSTN and mobile network: 10-32 kbps
Near-term H.263 and H.263+: established late 1997
Long-term H.26L, H.264: still under investigation
Main properties H.261 with many MPEG features optimized for low bit rates Performance: 3-4 dB improvements over H.261 at less than
64 kbps; 30% bit rate saving over MPEG-1
04/19/23 17
MPEG Coding and communications of moving pictures
and associated audio for digital storage and archival
MPEG: Moving Picture Expert Group MPEG family
MPEG-1, Nov 1992 MPEG-2, Nov 1994 MPEG-4, Oct 1998 MPEG-7, ongoing work
Main features of the MPEG video family Bi-directional MEMC I-frame, P-frame, B-frame Structure: Group of Pictures (GOP), picture, slice, macro-
block Coding decisions
04/19/23 18
MPEG Goals and Applications MPEG-1
Optimized for applications that support a continuous transfer bit rate of about 1.5 Mbps (example, CD-ROM)
Target 1.2 Mbps for video and 250-300 kbps for audio, around analog VHS quality
Does not support interlaced sources Main target source: SIF YCrCb 4:2:0 360 x 240 x 30 fps VCD
MPEG-2 The most commercially successful international coding
standard Wide range of bit rates: 4 – 80 Mbps; optimized for 4 Mbps Target high-resolution, high-quality video broadcast & playback DVD, Digital TV: DirecTV, HDTV…
04/19/23 19
Requirements Coding of generic video at around 1.5 Mbps at
reasonable quality (VHS) Random access capability, frequent access point Fast forward and fast rewind capability Audio-video synchronization during play and
access Simple decoder Flexibility of data format Certain degree of robustness to communication
errors Real-time encoder possibility
04/19/23 20
From H.261 to MPEG-1 There are a few new features in MPEG-1
comparing to the pioneering H.261 codec Flexible data sizes and frame rates More flexible slice structure to replace the fixed
GOB structure Data structure: introducing Group of Picture (GOP)
allowing frequent access points Bi-directional motion compensation, B-frames Half-pixel motion compensation More finely tuned VLCs for different purposes Quantization table (like JPEG) replaces single Q
step size
04/19/23 21
Bidirectional MC Properties Advantage
Higher coding efficiency, frame rate can be increased significantly with few bits
More accurate motion estimation & compensation
No error propagation Disadvantage
More memory buffer for frame storage (minimum of 3)
More end-to-end delay
04/19/23 22
H.264/AVC History In the early 1990’s, the first video compression
standards were introduced: H.261 (1990) and H.263 (1995) from ITU MPEG-1 (1993) and MPEG-2 (1996) from ISO
Since then, the technology has advanced rapidly H.263 was followed by H.263+, H.263++, H.26L MPEG-1/2 followed by MPEG-4 visual But industry and research coders are still way ahead
H.264/AVC is a joint project of ITU and ISO, to create an up-to-date standard.
04/19/23 23
Scope and Context Aimed at providing high-quality compression for
various services: IP streaming media (50-1500 kbps) SDTV and HDTV Broadcast and video-on-demand (1 - 8+ Mbps) DVD Conversational services (<1 Mbps, low latency)
Standard defines: Decoder functionality (but not encoder) File and stream structure
Final results: 2-fold improvement in compression Same fidelity, half the size --- Compared to H.263 and MPEG-2
04/19/23 24
Video Compression Motion compensation / prediction
Described current frame based on previous frame Output description + residual image Predicted frames are called “inter-frames”. Some frames (intra-frames) are encoded without prediction, as
natural images.
Image transform Concentrate image energy in relatively few numeric
coefficients
Lossy coding Compress coefficient values in a lossy manner Try to keep most important information
04/19/23 25
The H.263 Standard Coder
Motion Compensat
ion
Lossy Coding
compressed videooriginal video
Image Transfor
m
04/19/23 26
The H.263 Standard Coder
Motion Compensat
ion
Lossy Coding
compressed videooriginal video
Image Transfor
m
H.263 Motion Compensation• Image is divided into 16x16 macroblocks,
• Each macroblock is matched against nearby blocks in previous frame (called reference frame),
• “Nearby” = within 15-pixel horizontal/vertical range
• Half-pixel accuracy (with bilinear pixel interpolation)
• Best match is used to predict the macroblock,
• The relative displacement, or motion vector, is encoded and transmitted to decoder
• Prediction error for all blocks constitute the residual.
04/19/23 27
The H.263 Standard Coder
Motion Compensat
ion
Lossy Coding
compressed videooriginal video
Image Transfor
m
H.263 Image Transform• Residual is divided into 8x8 blocks,
• 8x8 2-d Discrete Cosine Transform (DCT) is applied to each block independently
• DCT coefficients describe spatial frequencies in the block:
• High frequencies correspond to small features and texture
• Low frequencies correspond to larger features
• Lowest frequency coefficient, called DC, corresponds to the average intensity of the block
04/19/23 29
The H.263 Standard Coder
Motion Compensat
ion
Lossy Coding
compressed videooriginal video
Image Transfor
m
H.263 Lossy Coding• Transform coefficients are quantized:
• Some less-significant bits are dropped
• Only the remaining bits are encoded
• For inter-frames, all coefficients get the same number of bits, except for the DC which gets more.
• For intra-frames, lower-frequency coefficients get more bits
• To preserve larger features better
• The actual number of bits used depends on a quantization parameter (QP), whose value depends on the bit-allocation policy
• Finally, bits are encoded using entropy (lossless) code
• Traditionally Huffman-style code04/19/23 33
Changes in Motion Compensation Quarter-pixel accuracy
A gain of 1.5-2dB across the board over ½-pixel Variable block-size:
Every 16x16 macroblock can be subdivided Each sub-block gets predicted separately
Multiple and arbitrary reference frames Vs. only previous (H.263) or previous and next
(MPEG). Anti-aliasing sub-pixel interpolation
Removes some common artifacts in residual
04/19/23 34
Variable Block-Size MC Motivation: size of moving/stationary
objects is variable Many small blocks may take too many bits to
encode Few large blocks give lousy prediction
In H.264, each 16x16 macroblock may be: Kept whole, Divided horizontally (vertically) into two sub-
blocks of size 16x8 (8x16) Divided into 4 sub-blocks In the last case, the 4 sub-blocks may be divided
once more into 2 or 4 smaller blocks.04/19/23 35
Arbitrary Reference Frames In H.263, the reference frame for prediction is
always the previous frame In MPEG and H.26L, some frames are predicted
from both the previous and the next frames (bi-prediction)
In H.264, any one frame may be used as reference: Encoder and decoder maintain synchronized buffers of
available frames (previously decoded) Reference frame is specified as index into this buffer
In bi-predictive mode, each macroblock may be: Predicted from one of the two references Predicted from both, using weighted mean of predictors
04/19/23 41
Intra Prediction Motivation: intra-frames are natural images, so
they exhibit strong spatial correlation Implemented to some extent in H.263++ and MPEG-4, but in
transform domain
Macroblocks in intra-coded frames are predicted based on previously-coded ones Above and/or to the left of the current block The macroblock may be divided into 16 4x4 sub-blocks which
are predicted in cascading fashion
An encoded parameter specifies which neighbors should be used to predict, and how
04/19/23 42
H.264 Image Transform Motivation:
DCT requires real-number operations, which may cause inaccuracies in inversion
H.264 uses a very simple integer 4x4 transform A (pretty crude) approximation to 4x4 DCT Transform matrix contains only +/-1 and +/-2
Can be computed with only additions, subtractions, and shifts
Results show negligible loss in quality (~0.02dB)
04/19/23 47
04/19/23 48
Deblocking Filter
Non Deblocked Image Deblocked ImageCourtesy : Images from http://compression.ru/video/deblocking/
Entropy Coding Motivation: traditional coders use fixed,
variable-length codes Essentially Huffman-style codes Non-adaptive Can’t encode symbols with probability > 0.5
efficiently, since at least one bit required H.263 Annex E defines an arithmetic coder
Still non-adaptive Uses multiple non-binary alphabets, which
results in high computational complexity
04/19/23 49
Entropy Coding: CABAC Context-adaptive binary arithmetic coding
(CABAC) framework designed specifically for H.264
Binarization: all syntax symbols are translated to bit-strings
399 predefined context models, used in groups E.g. models 14-20 used to code macroblock
type for inter-frames The model to use next is selected based on
previously coded information (the context)
04/19/23 50