video coding standard
DESCRIPTION
TRANSCRIPT
CMPT365 Multimedia Systems 1
Media Compression- Video Coding Standards
Fall 2005
CMPT 365 Multimedia Systems
CMPT365 Multimedia Systems 2
Video Coding Standards
H.264/AVC
CMPT365 Multimedia Systems 3
Coding Rate and Standards
8 16 64 384 1.5 5 20
kbit/s Mbit/s
Very low bitrate Low bitrate Medium bitrate High bitrate
Mobilevideophone
Videophoneover PSTN
ISDNvideophone
Digital TV HDTVVideo CD
MPEG-4 MPEG-1 MPEG-2H.261H.263
CMPT365 Multimedia Systems 4
Standardization Organizations
ITU-T VCEG (Video Coding Experts Group)
standards for advanced moving image coding methods appropriate for conversational and non-conversational audio/visual applications.
ISO/IEC MPEG (Moving Picture Experts Group)
standards for compression and coding, decompression, processing, and coded representation of moving pictures, audio, and their combination
Relation ITU-T H.262~ISO/IEC 13818-2(mpeg2) Generic Coding of Moving Pictures and
Associated Audio. ITU-T H.263~ISO/IEC 14496-2(mpeg4)
WG - work groupSG – sub group ISO/IEC JTC 1/SC 29/WG 1 Coding of Still Pictures
ISO/IEC JTC 1/SC 29/WG 11
CMPT365 Multimedia Systems 5
Introduction
H.261 MPEG-1 MPEG-2 H.263 MPEG-4 H.264
CMPT365 Multimedia Systems 6
H.261
Earliest DCT-based video standard: 1990 ITU Recommendation for videoconferencing and
videophones over ISDN Targeted bit rate: p x 64 kbps (p=1, …, 30)
Videophone: low rate, e.g., 64kbps Videoconferencing: high rate, e.g., 384kbps (p=6) Max: 1.92Mbps (p=30)
Picture format: CIF (Common Intermediate Format, 352 x 288) QCIF (Quarter CIF): 176 x 144.
Max delay: 150 ms (for bidirectional interactivity) Sequential search Amenable to low-cost VLSI implementation No B mode
CMPT365 Multimedia Systems 7
Layered Structure for Video Data Video multiplex arrangement:
Picture layer GOB layer MB layer block layer Group of Blocks (GOB):
3 rows of 11 macroblocks (MBs)(Y: 176 x 48, UV: 88 x 24)
QCIF: 3 GOBs CIF: 12 GOBs MB: 16 x 16 luma
One GOB
QCIF: 176 x 144
CIF: 352 x 288
Cr Cb
MB
Y1 Y2
Y3 Y4
8x8
CMPT365 Multimedia Systems 8
Entropy coding
Similar to JPEG Zigzag scan (Run, Level) coding EOB
CMPT365 Multimedia Systems 9
Introduction
H.261 MPEG-1 MPEG-2 H.263 MPEG-4 H.264
CMPT365 Multimedia Systems 10
MPEG-1
Committee formed in 1988 Finalized in 1991 Used for VCD Random access, fast forward/reverse search Delay: 1 sec (for unidirectional video access) 1/2-pixel ME/MC No deblocking filter B frames Software-only decoding is possible MPEG-1 Audio coding:
3 layers of encoding:• Layer 1: 4 : 1 compression ratio with CD quality• Layer 2: 6 : 1 to 8 : 1• Layer 3 (MP3): 10 : 1 to 12 : 1
CMPT365 Multimedia Systems 11
MPEG-1 Video
Progressive video only Layered structure:
Sequence, Group of picture (GOP), Picture, Slice, Macroblock, Block
I B B P … B B P
……
GOP
I B B P … B B P
……
GOP
CMPT365 Multimedia Systems 12
Quantization and Entropy Coding
Stepsize varies by frequency for I blocks Similar to JPEG Scaling is adjusted on a MB basis 8 16 19 22 26 27 29 34
16 16 22 24 27 29 34 3719 22 26 27 29 34 34 3822 22 26 27 29 34 37 4022 26 27 29 32 35 40 4826 27 29 32 35 40 48 5826 27 29 34 38 46 56 6927 29 35 38 46 56 69 83
Entropy coding: Similar to JPEG and H.261
CMPT365 Multimedia Systems 13
B frames
Temporal prediction for B pictures:
1 ,1 ,5.0 ,0 , ,ˆˆˆ21212211 ccb
bC1
C2
Frame k-1
Frame k
Frame k+1
prediction nalbidirectio :5.0
prediction backward:1 ,0
prediction forward:0,1
21
21
21
CMPT365 Multimedia Systems 14
Introduction
Rate Control and in-loop deblocking filter H.261 MPEG-1 MPEG-2 H.263 MPEG-4 H.264
CMPT365 Multimedia Systems 15
MPEG-2
Completed in 1994 Extension of MPEG-1 Standard for DVD, SDTV, HDTV Support interlaced inputs Support scalable coding Flexible frame size Low delay Support a wide range of applications Source format: 4:4:4:, 4:2:2, 4:2:0 1/2-pixel ME/MC (bilinear interpolation) B frames MPEG-2 Audio:
Support 5.1 channels AAC: 30% fewer bits than MP3
CMPT365 Multimedia Systems 16
Profiles and Levels
Defined to manage the large number of coding tools and the broad range of formats and bit rates supported
Profiles and levels define a set of conformance points, each targeting a class applications
Maximize interoperability and limiting the complexity Profile: a subset of the entire bit stream syntax Levels: a specified set of constraints imposed on values
of the syntax elements in the bit stream (maximum bit rate, buffer size, pic. resolution)
CMPT365 Multimedia Systems 17
MPEG-2 Levels
Level Max Pixels Max Lines Max Frame/s
Low 352 288 30
Main 720 576 30
High 1440 1440 1152 60
High 1920 1152 60
CMPT365 Multimedia Systems 18
Introduction
Rate Control and in-loop deblocking filter H.261 MPEG-1 MPEG-2 H.263 MPEG-4 H.264
CMPT365 Multimedia Systems 19
H.263 Derived from H.261 Intended for very low bit-rate application
Better quality at 18-24kbps than H.261 at 64 kbps Used in MS NetMeeting, Messenger …
Can handle high resolution (up to 16CIF: 1408 x 1152) No loop filter 1/2-pixel ME/MC Optional coding modes (defined in 8 Annexes):
Unrestricted motion vector (Annex D):• MV can point outside of picture boundary by extrapolating the
boundary pixels (repeat padding is usually used)• MV range: [-31.5, 31.5]
Arithmetic coding Advanced prediction (Annex F):
• Overlapped block motion compensation• 4MV: 1 for each 8x8 block
CMPT365 Multimedia Systems 20
Advanced Prediction (4MV) Each 8x8 block in a MB can have its own MV Suitable when there is complicated motion in the MB Need more bits to encode the MVs Need to compare the performance fo 1 MV and 4MV
MV2
MV1 MV
MV3 MV2
MV1 MV
MV3
MV2
MV1 MV
MV3
MV1 MV
MV2 MV3
CMPT365 Multimedia Systems 21
Run-Level-Last Entropy Coding
3-D VLC: (LAST, RUN, LEVEL): LAST: 1 for last non-zero coefficient of a block 0 otherwise RUN: number of zeros before the current coefficient LEVEL: value of the current non-zero coefficient
No EOB as in JPEG
CMPT365 Multimedia Systems 22
H.263+ and H.263++
H.263+: Second version of H.263 Some further optional features: Annex I to T. Annex J: in-loop deblocking filter H.263++: three more optional modes (2000)
Annex V: Data partitioned slice mode • For enhanced resilience to transmission error
CMPT365 Multimedia Systems 23
Introduction
Rate Control and in-loop deblocking filter H.261 MPEG-1 MPEG-2 H.263 MPEG-4 H.264
CMPT365 Multimedia Systems 24
MPEG-4
Based on H.263 A new concept rather than an improved algorithm Deal with a variety of multimedia contents: audio, visual
, image, graphic. Part 2: Visual
Based on H.263 Object-based coding Coding of animated objects Scalability: Fine Granular Scalability (FGS) Texture coding: wavelet-based
Part 10: Advanced Video Coding H.264
CMPT365 Multimedia Systems 25
Video Objects (VO)
MPEG-4 treats a video sequence as a collection of video objects
Each scene is decomposed into multiple objects The segmentation method is not part of the standard
Each object is specified by shape, motion, and texture. Natural visual Objects:
Image, video, sprite (background) Synthetic visual object:
Face and body 2-D mesh 3-D mesh
The decoder can compose different scenes by using different number of decoded objects
CMPT365 Multimedia Systems 26
Scene Composition
The decoder can compose different scenes by using different number of decoded objects
CMPT365 Multimedia Systems 27
MPEG-4 Structure
A/Vobject
Decoder
MUX
Com
posito
r
Bitstream Audio/Video scene
A/Vobject
Decoder
A/Vobject
Decoder
CMPT365 Multimedia Systems 28
A video frame
Background VOP
VOP
VOP
More MPEG-4 Example
Instead of ”frames”: Video Object Planes Shape Adaptive DCT
Alpha map
SA DCT
CMPT365 Multimedia Systems 29
Object 2
Object 1
Object 3
Object 4
Example
Problems, comments?
CMPT365 Multimedia Systems 30
Example
CMPT365 Multimedia Systems 31
Status
Microsoft, RealVideo, QuickTime, ... But only recentagular
frame based H.264 = MPEG-4 part 10
(2003)
CMPT365 Multimedia Systems 32
Summary of StandardsStandard Digitisation
formatCompressed rate Example applications
H. 261 CIF/ QCIF X 64 kbps Video conferencing over LANs
H. 263 S-QCIF/ QCIF <64kbps Video conferencing over low bits rate channels
MPEG 1 SIF <1.5Mbps VHS quality video storage
MPEG 2LowMain
High 1440
High
SIF4:2:04:2:24:2:04:2:24:2:04:2:0
<4Mbps<15Mbps<20Mbps<60Mbps<80Mbps<80Mbps<100Mbps
VHS quality video recordingDigital video broadcasting
High definition TV (4/3)
High definition TV (16/9)
MPEG 4 Various 5kbps – tens Mbps Versatile multimedia coding standard
H.264 Various Various Various
SIF: Standard Interchange Format, 352x240 pixels at 30 Hz.
CMPT365 Multimedia Systems 33
What’s Next ? - H.264
1998: Call for proposal for H.26L issued by ITU-T VCEG (Video Coding Expert Group)
Objective: 50% bit rate savings compared to MPEG-2 High quality video at both low and high bit rates More error resilience tools
Oct. 1999: First draft design Dec. 2001: VCEG and MPEG formed the Joint
Video Team (JVT) Approved in 2003:
ITU-T H.264 and ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC)
CMPT365 Multimedia Systems 34
Applications Bit rate: 64kbps to 240Mbps Broadcast over cable, satellite, DSL … Interactive/serial storage on optical/magnetic devices, DVD … Conversational services over network Video on demand, streaming media over network Multimedia messaging service over network
Three Profiles: Baseline, Main, and Extended 15 levels Four new profiles in Fidelity Range Extenstions (FRExt):
High, High 10, High 4:2:2, High 4:4:4
CMPT365 Multimedia Systems 35
Two-Layer Structure Video Coding Layer (VCL)
Effectively represent the video content Network Adaptation Layer (NAL)
• Enable simple and effective customization of the VCL• allows H.264 to be transported over different networks
Video Coding Layer
Data Partitioning
Network Adaptation Layer
H.320 MP4FF H.323/IP MPEG-2 etc.
Coded Macroblock
Coded Slice/Partition
CMPT365 Multimedia Systems 36
Block Diagram
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
OutputVideoSignal
CMPT365 Multimedia Systems 37
Video Coding Layer: Slice coding
Slice 1
Slice 2
Slice 3
Slices can have different shapes and sizes Each slice is self-contained
Can be decoded without knowing data other slices Useful for:
Error resilience and concealment Parallel processing
CMPT365 Multimedia Systems 38
Intra-Picture Prediction
Performed in spatial domain instead of in transform domain
Two basic prediction modes: Intra 4x4: for areas with details Intra 16x16: for smooth areas I_PCM: No prediction, raw samples are sent directly.
• To limit the maximum number of bits for each block
CMPT365 Multimedia Systems 39
Intra-Picture Prediction Intra_4x4 Prediction (9 modes)
Predict each 4 x 4 block Suitable for details
8
1
6
4507
3
Prediction Directions(Mode 2: DC prediction)
Current 4x4 block
Neighbors used for prediction
Mode 0 Mode 3 Mode 4
CMPT365 Multimedia Systems 40
Intra-Picture Prediction cont’d
Intra_16x16 prediction (4 modes) Predict the entire 16 x 16 luma block Suitable for smooth areas
CMPT365 Multimedia Systems 41
Inter-Picture Prediction P macro-blocks can be partitioned into smaller regions
Up to 16 MVs MVs are differentially encoded. Need lot of optimization efforts to decide the best mode.
16 x 16 16 x 8 8 x 16 8 x 8
8 x 4 4 x 8 4 x 4
CMPT365 Multimedia Systems 42
Multiple Reference Pictures
More than one previously decoded pictures can be used as reference
CMPT365 Multimedia Systems 43
4x4 Integer Transform
1-22-1
2-1-12
11-1-1
1111
H
Fast implementation Smaller size leads to less noise around edges.
10/1
10/1
4/1
1/4
1/10-2/10- 1/41/4
2/10 1/10-1/4-1/4
2/10-1/101/4-1/4
1/10 2/101/41/4
1 THH
16 x 16
8x8
Hierarchical Transform: For further decorrelation Apply 4x4 WHT to Luma DC Apply 2x2 WHT to chroma DC
1 1-1-1
1-1-1 1
1-1 1-1
1 1 1 1
2
12H
CMPT365 Multimedia Systems 44
Entropy Coding
CAVLC: Context-adaptive VLC CABAC: Context adaptive binary arithmetic
coding 9-14% more efficient than CAVLC
CMPT365 Multimedia Systems 45
Context Modeling
Encode the next symbol based on context info Collect probability distribution for each possible
context: p (x | Ci) Four types of context models:
Use neighboring block info Use previous bins (b0, b1, …bi-1) as context for bi. Use scanning position (for transform coeff coding) Use accumulated number of encoded levels with specific
value (for transform coeff coding)
CMPT365 Multimedia Systems 46
New Directions for H.264
SNR scalability Multi-view coding (3D Audio-visual coding)
CMPT365 Multimedia Systems 47
Reference
D. Marpe, H. Schwarz, T. Wiegand, Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 13 , Issue: 7 , July 2003, Pages: 620 – 636.