video_compression_2004
TRANSCRIPT
![Page 1: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/1.jpg)
April 22, 2004 Page 1John G. Apostolopoulos
VideoCoding
Video Compression
MIT 6.344, Spring 2004
John G. ApostolopoulosStreaming Media Systems Group
Hewlett-Packard [email protected]
![Page 2: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/2.jpg)
John G. ApostolopoulosPage 2
VideoCoding
April 22, 2004
Overview of Next Three Lectures
• Video Compression (Thurs, 4/22)– Principles and practice of video coding– Basics behind MPEG compression algorithms– Current image & video compression standards
• Video Communication & Video Streaming I (Tues, 4/27)– Video application contexts & examples: DVD and Digital TV– Challenges in video streaming over the Internet– Techniques for overcoming these challenges
• Video Communication & Video Streaming II (Thurs, 4/29)– Video over lossy packet networks and wireless links → Error-
resilient video communications
Today
![Page 3: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/3.jpg)
John G. ApostolopoulosPage 3
VideoCoding
April 22, 2004
Outline of Today’s Lecture
• Motivation for compression• Brief review of generic compression system (from prior lecture)• Brief review of image compression (from last lecture)• Video compression
– Exploit temporal dimension of video signal– Motion-compensated prediction– Generic (MPEG-type) video coder architecture– Scalable video coding
• Overview of current video compression standards– What do the standards specify?– Frame-based video coding: MPEG-1/2/4, H.261/3/4– Object-based video coding: MPEG-4
![Page 4: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/4.jpg)
John G. ApostolopoulosPage 4
VideoCoding
April 22, 2004
Motivation for Compression:Example of HDTV Video Signal
• Problem:– Raw video contains an immense amount of data– Communication and storage capabilities are limited
and expensive• Example HDTV video signal:
– 720x1280 pixels/frame, progressive scanning at 60 frames/s:
– 20 Mb/s HDTV channel bandwidth→ Requires compression by a factor of 70 (equivalent
to .35 bits/pixel)
sGbcolor
bitspixelcolorsframes
framepixels /3.183
sec601280720
=⎟⎠⎞
⎜⎝⎛⎟⎠
⎞⎜⎝
⎛⎟⎠⎞
⎜⎝⎛⎟⎠
⎞⎜⎝
⎛ ×
![Page 5: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/5.jpg)
John G. ApostolopoulosPage 5
VideoCoding
April 22, 2004
Achieving Compression
• Reduce redundancy and irrelevancy• Sources of redundancy
– Temporal: Adjacent frames highly correlated– Spatial: Nearby pixels are often correlated with
each other– Color space: RGB components are correlated
among themselves→ Relatively straightforward to exploit
• Irrelevancy– Perceptually unimportant information→ Difficult to model and exploit
![Page 6: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/6.jpg)
John G. ApostolopoulosPage 6
VideoCoding
April 22, 2004
Spatial and Temporal Redundancy
• Why can video be compressed?– Video contains much spatial and temporal redundancy.
• Spatial redundancy: Neighboring pixels are similar• Temporal redundancy: Adjacent frames are similar
Compression is achieved by exploiting the spatial and temporal redundancy inherent to video.
![Page 7: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/7.jpg)
John G. ApostolopoulosPage 7
VideoCoding
April 22, 2004
Outline of Today’s Lecture
• Motivation for compression• Brief review of generic compression system (from prior lecture)• Brief review of image compression (from last lecture)• Video compression
– Exploit temporal dimension of video signal– Motion-compensated prediction– Generic (MPEG-type) video coder architecture– Scalable video coding
• Overview of current video compression standards– What do the standards specify?– Frame-based video coding: MPEG-1/2/4, H.261/3/4– Object-based video coding: MPEG-4
![Page 8: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/8.jpg)
John G. ApostolopoulosPage 8
VideoCoding
April 22, 2004
Generic Compression System
A compression system is composed of three key building blocks:• Representation
– Concentrates important information into a few parameters• Quantization
– Discretizes parameters• Binary encoding
– Exploits non-uniform statistics of quantized parameters– Creates bitstream for transmission
Representation(Analysis) Quantization Binary
Encoding
OriginalSignal
CompressedBitstream
![Page 9: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/9.jpg)
John G. ApostolopoulosPage 9
VideoCoding
April 22, 2004
Generic Compression System (cont.)
• Generally, the only operation that is lossy is the quantization stage
• The fact that all the loss (distortion) is localized to a single operation greatly simplifies system design
• Can design loss to exploit human visual system (HVS) properties
Representation(Analysis) Quantization
OriginalSignal
CompressedBitstreamBinary
Encoding
Generallylossless
Lossy Lossless
![Page 10: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/10.jpg)
John G. ApostolopoulosPage 10
VideoCoding
April 22, 2004
Generic Compression System (cont.)
• Source decoder performs the inverse of each of the three operations
Representation(Analysis) Quantization
OriginalSignal
CompressedBitstream
Representation(Synthesis)
InverseQuantization
ChannelReconstructed
Signal
BinaryEncoding
BinaryDecoding
Source Encoder
Source Decoder
![Page 11: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/11.jpg)
John G. ApostolopoulosPage 11
VideoCoding
April 22, 2004
Review of Image Compression
OriginalImage
CompressedBitstream
QuantizationRunlength &
HuffmanCoding
RGBto
YUVBlock DCT
• Coding an image (single frame):– RGB to YUV color-space conversion– Partition image into 8x8-pixel blocks– 2-D DCT of each block– Quantize each DCT coefficient– Runlength and Huffman code the nonzero quantized DCT
coefficients→ Basis for the JPEG Image Compression Standard→ JPEG-2000 uses wavelet transform and arithmetic coding
![Page 12: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/12.jpg)
John G. ApostolopoulosPage 12
VideoCoding
April 22, 2004
Outline of Today’s Lecture
• Motivation for compression• Brief review of generic compression system (from prior lecture)• Brief review of image compression (from last lecture)• Video compression
– Exploit temporal dimension of video signal– Motion-compensated prediction– Generic (MPEG-type) video coder architecture– Scalable video coding
• Overview of current video compression standards– What do the standards specify?– Frame-based video coding: MPEG-1/2/4, H.261/3/4– Object-based video coding: MPEG-4
![Page 13: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/13.jpg)
John G. ApostolopoulosPage 13
VideoCoding
April 22, 2004
Video Compression
• Video: Sequence of frames (images) that are related• Related along the temporal dimension
– Therefore temporal redundancy exists• Main addition over image compression
– Temporal redundancy→ Video coder must exploit the temporal redundancy
![Page 14: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/14.jpg)
John G. ApostolopoulosPage 14
VideoCoding
April 22, 2004
Temporal Processing
• Usually high frame rate: Significant temporal redundancy• Possible representations along temporal dimension:
– Transform/subband methods– Good for textbook case of constant velocity uniform
global motion– Inefficient for nonuniform motion, I.e. real-world motion– Requires large number of frame stores
– Leads to delay (Memory cost may also be an issue)
– Predictive methods– Good performance using only 2 frame stores– However, simple frame differencing in not enough…
![Page 15: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/15.jpg)
John G. ApostolopoulosPage 15
VideoCoding
April 22, 2004
Video Compression
• Goal: Exploit the temporal redundancy • Predict current frame based on previously coded frames• Three types of coded frames:
– I-frame: Intra-coded frame, coded independently of all other frames
– P-frame: Predictively coded frame, coded based on previously coded frame
– B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames
I frame P-frame B-frame
![Page 16: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/16.jpg)
John G. ApostolopoulosPage 16
VideoCoding
April 22, 2004
Temporal Processing:Motion-Compensated Prediction
• Simple frame differencing fails when there is motion• Must account for motion
→ Motion-compensated (MC) prediction• MC-prediction generally provides significant improvements• Questions:
– How can we estimate motion?– How can we form MC-prediction?
![Page 17: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/17.jpg)
John G. ApostolopoulosPage 17
VideoCoding
April 22, 2004
Temporal Processing:Motion Estimation
• Ideal situation:– Partition video into moving objects– Describe object motion→ Generally very difficult
• Practical approach: Block-Matching Motion Estimation– Partition each frame into blocks, e.g. 16x16 pixels– Describe motion of each block→ No object identification required→ Good, robust performance
![Page 18: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/18.jpg)
John G. ApostolopoulosPage 18
VideoCoding
April 22, 2004
Block-Matching Motion Estimation
161514
13
1211
109
876
5
432
1
1615
1413
1211
109
87
65
43
21
Reference Frame Current Frame
Motion Vector(mv1, mv2)
• Assumptions:– Translational motion within block:
– All pixels within each block have the same motion• ME Algorithm:
1) Divide current frame into non-overlapping N1xN2 blocks2) For each block, find the best matching block in reference frame
• MC-Prediction Algorithm:– Use best matching blocks of reference frame as prediction of
blocks in current frame
),,(),,( 221121 refcur kmvnmvnfknnf −−=
![Page 19: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/19.jpg)
John G. ApostolopoulosPage 19
VideoCoding
April 22, 2004
Block Matching:Determining the Best Matching Block
• For each block in the current frame search for best matching block in the reference frame
– Metrics for determining “best match”:
– Candidate blocks: – Strategies for searching candidate blocks for best match
– Full search: Examine all candidate blocks– Partial (fast) search: Examine a carefully selected subset
• Estimate of motion for best matching block: “motion vector”
( )[ ]( )∑ ∑
∈
−−−=21,
2221121 ),,(,,
nn Blockrefcur kmvnmvnfknnfMSE
( )( )∑ ∑
∈
−−−=21,
221121 ),,(,,nn Block
refcur kmvnmvnfknnfMAE
( ) area pixel 32,32 e.g., in, blocks All ±±
![Page 20: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/20.jpg)
John G. ApostolopoulosPage 20
VideoCoding
April 22, 2004
Motion Vectors and Motion Vector Field
• Motion vector– Expresses the relative horizontal and vertical offsets
(mv1,mv2), or motion, of a given block from one frame to another
– Each block has its own motion vector• Motion vector field
– Collection of motion vectors for all the blocks in a frame
![Page 21: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/21.jpg)
John G. ApostolopoulosPage 21
VideoCoding
April 22, 2004
Example of Fast Motion Estimation Search:3-Step (Log) Search
• Goal: Reduce number of search points
• Example:• Dots represent search points• Search performed in 3 steps
(coarse-to-fine):Step 1:Step 2:Step 3:
• Best match is found at each step• Next step: Search is centered
around the best match of prior step
• Speedup increases for larger search areas
( )pixels4±( )pixels2±( )pixels1±
( ) areasearch 7,7 ±±
![Page 22: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/22.jpg)
John G. ApostolopoulosPage 22
VideoCoding
April 22, 2004
Motion Vector Precision?
• Motivation:– Motion is not limited to integer-pixel offsets– However, video only known at discrete pixel locations– To estimate sub-pixel motion, frames must be spatially
interpolated• Fractional MVs are used to represent the sub-pixel motion• Improved performance (extra complexity is worthwhile)• Half-pixel ME used in most standards: MPEG-1/2/4• Why are half-pixel motion vectors better?
– Can capture half-pixel motion– Averaging effect (from spatial interpolation) reduces
prediction error → Improved prediction– For noisy sequences, averaging effect reduces noise →
Improved compression
![Page 23: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/23.jpg)
John G. ApostolopoulosPage 23
VideoCoding
April 22, 2004
Practical Half-Pixel Motion Estimation Algorithm
• Half-pixel ME (coarse-fine) algorithm:1) Coarse step: Perform integer motion estimation on blocks; find
best integer-pixel MV2) Fine step: Refine estimate to find best half-pixel MV
a) Spatially interpolate the selected region in reference frameb) Compare current block to interpolated reference frame
blockc) Choose the integer or half-pixel offset that provides best
match• Typically, bilinear interpolation is used for spatial interpolation
![Page 24: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/24.jpg)
John G. ApostolopoulosPage 24
VideoCoding
April 22, 2004
Example: MC-Prediction for Two Consecutive Frames
Previous Frame(Reference Frame)
Current Frame(To be Predicted)
161514
13
1211
109
876
5
432
1
16 15 1413
12 11 109
8 7 65
4 3 21
Reference Frame Predicted Frame
![Page 25: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/25.jpg)
John G. ApostolopoulosPage 25
VideoCoding
April 22, 2004
Example: MC-Prediction for Two Consecutive Frames (cont.)
Prediction of Current Frame
Prediction Error(Residual)
![Page 26: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/26.jpg)
John G. ApostolopoulosPage 26
VideoCoding
April 22, 2004
Block Matching Algorithm: Summary• Issues:
– Block size?– Search range?– Motion vector accuracy?
• Motion typically estimated only from luminance• Advantages:
– Good, robust performance for compression– Resulting motion vector field is easy to represent (one MV
per block) and useful for compression– Simple, periodic structure, easy VLSI implementations
• Disadvantages:– Assumes translational motion model → Breaks down for
more complex motion– Often produces blocking artifacts (OK for coding with
Block DCT)
![Page 27: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/27.jpg)
John G. ApostolopoulosPage 27
VideoCoding
April 22, 2004
Bi-Directional MC-Prediction
161514
13
1211
109
876
5
432
1
1615
• Bi-Directional MC-Prediction is used to estimate a block in the current frame from a block in:1) Previous frame2) Future frame3) Average of a block from the previous frame and a block
from the future frame4) Neither, i.e. code current block without prediction
1413
1211
109
87
43
161514
13
121110
9
876
5
4321
65
21
Previous Frame Current Frame Future Frame
![Page 28: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/28.jpg)
John G. ApostolopoulosPage 28
VideoCoding
April 22, 2004
MC-Prediction and Bi-Directional MC-Prediction (P- and B-frames)
• Motion compensated prediction: Predict the current frame based on reference frame(s) while compensating for the motion
• Examples of block-based motion-compensated prediction (P-frame) and bi-directional prediction (B-frame):
161514
13
1211
109
876
5
432
1
1615
1413
1211
109
87
65
43
21
Previous Frame B-Frame
161514
13
121110
9
876
5
4
321
Future Frame
161514
13
1211
109
876
5
432
1
1615
1413
1211
109
87
65
43
21
Previous Frame P-Frame
![Page 29: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/29.jpg)
John G. ApostolopoulosPage 29
VideoCoding
April 22, 2004
Video Compression
• Main addition over image compression: – Exploit the temporal redundancy
• Predict current frame based on previously coded frames• Three types of coded frames:
– I-frame: Intra-coded frame, coded independently of all other frames
– P-frame: Predictively coded frame, coded based on previously coded frame
– B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames
I frame P-frame B-frame
![Page 30: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/30.jpg)
John G. ApostolopoulosPage 30
VideoCoding
April 22, 2004
Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)
• Arrows show prediction dependencies between frames
MPEG GOP
I0 B1 B2 P3 B4 B5 P6 B7 B8 I9
![Page 31: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/31.jpg)
John G. ApostolopoulosPage 31
VideoCoding
April 22, 2004
Summary of Temporal Processing
• Use MC-prediction (P and B frames) to reduce temporal redundancy
• MC-prediction usually performs well; In compression have a second chance to recover when it performs badly
• MC-prediction yields:– Motion vectors– MC-prediction error or residual → Code error with
conventional image coder• Sometimes MC-prediction may perform badly
– Examples: Complex motion, new imagery (occlusions)– Approach:
1. Identify frame or individual blocks where prediction fails 2. Code without prediction
![Page 32: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/32.jpg)
John G. ApostolopoulosPage 32
VideoCoding
April 22, 2004
Basic Video Compression Architecture
• Exploiting the redundancies:– Temporal: MC-prediction (P and B frames)– Spatial: Block DCT– Color: Color space conversion
• Scalar quantization of DCT coefficients• Zigzag scanning, runlength and Huffman coding of the
nonzero quantized DCT coefficients
![Page 33: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/33.jpg)
John G. ApostolopoulosPage 33
VideoCoding
April 22, 2004
Example Video Encoder
DCT HuffmanCoding
MotionEstimation
MotionCompensation
BufferRGB
toYUV
InputVideoSignal
MV data
MV data
MC-Prediction
Residual
Quantize
InverseDCT
InverseQuantize
Buffer fullness
Frame Store
OutputBitstream
PreviousReconstructedFrame
![Page 34: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/34.jpg)
John G. ApostolopoulosPage 34
VideoCoding
April 22, 2004
Example Video Decoder
HuffmanDecoder
MotionCompensation
Buffer YUV to RGB
ReconstructedFrameResidual
MV data
OutputVideoSignal
InputBitstream
MC-Prediction
InverseDCT
InverseQuantize
Frame Store
PreviousReconstructedFrame
![Page 35: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/35.jpg)
John G. ApostolopoulosPage 35
VideoCoding
April 22, 2004
Outline of Today’s Lecture
• Motivation for compression• Brief review of generic compression system (from prior lecture)• Brief review of image compression (from last lecture)• Video compression
– Exploit temporal dimension of video signal– Motion-compensated prediction– Generic (MPEG-type) video coder architecture– Scalable video coding
• Overview of current video compression standards– What do the standards specify?– Frame-based video coding: MPEG-1/2/4, H.261/3/4– Object-based video coding: MPEG-4
![Page 36: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/36.jpg)
John G. ApostolopoulosPage 36
VideoCoding
April 22, 2004
Motivation for Scalable Coding
Basic situation:1. Diverse receivers may request the same video
– Different bandwidths, spatial resolutions, frame rates, computational capabilities
2. Heterogeneous networks and a priori unknown network conditions– Wired and wireless links, time-varying bandwidths
→ When you originally code the video you don’t know which client or network situation will exist in the future
→ Probably have multiple different situations, each requiring a different compressed bitstream
→ Need a different compressed video matched to each situation• Possible solutions:
1. Compress & store MANY different versions of the same video 2. Real-time transcoding (e.g. decode/re-encode)3. Scalable coding
![Page 37: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/37.jpg)
John G. ApostolopoulosPage 37
VideoCoding
April 22, 2004
Scalable Video Coding
• Scalable coding:– Decompose video into multiple layers of prioritized
importance– Code layers into base and enhancement bitstreams– Progressively combine one or more bitstreams to produce
different levels of video quality• Example of scalable coding with base and two enhancement
layers: Can produce three different qualities1. Base layer2. Base + Enh1 layers3. Base + Enh1 + Enh2 layers
• Scalability with respect to: Spatial or temporal resolution, bitrate, computation, memory
Higher quality
![Page 38: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/38.jpg)
John G. ApostolopoulosPage 38
VideoCoding
April 22, 2004
Example of Scalable Coding• Encode image/video into three layers:
EncoderBase Enh1 Enh2
• Low-bandwidth receiver: Send only Base layer
Decoder Low ResBase
• Medium-bandwidth receiver: Send Base & Enh1 layers
Decoder Med ResBase Enh1
Decoder High ResBase Enh1 Enh2
• High-bandwidth receiver: Send all three layers
• Can adapt to different clients and network situations
![Page 39: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/39.jpg)
John G. ApostolopoulosPage 39
VideoCoding
April 22, 2004
Scalable Video Coding (cont.)
• Three basic types of scalability (refine video quality along three different dimensions):
– Temporal scalability → Temporal resolution– Spatial scalability → Spatial resolution– SNR (quality) scalability → Amplitude resolution
• Each type of scalable coding provides scalability of one dimension of the video signal
– Can combine multiple types of scalability to provide scalability along multiple dimensions
![Page 40: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/40.jpg)
John G. ApostolopoulosPage 40
VideoCoding
April 22, 2004
Scalable Coding: Temporal Scalability
• Temporal scalability: Based on the use of B-frames to refine the temporal resolution
– B-frames are dependent on other frames– However, no other frame depends on a B-frame– Each B-frame may be discarded without affecting
other frames
PI B B PB B IB B
MPEG GOP
0 1 2 3 4 5 6 7 8 9
![Page 41: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/41.jpg)
John G. ApostolopoulosPage 41
VideoCoding
April 22, 2004
Scalable Coding: Spatial Scalability
• Spatial scalability: Based on refining the spatial resolution– Base layer is low resolution version of video– Enh1 contains coded difference between upsampled
base layer and original video– Also called: Pyramid coding
↓2
EncBase layer
Enh layerEnc
↑2
Dec
Dec
↑2
DecLow-ResVideo
High-ResVideoOriginal
Video
![Page 42: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/42.jpg)
John G. ApostolopoulosPage 42
VideoCoding
April 22, 2004
Scalable Coding: SNR (Quality) Scalability
• SNR (Quality) Scalability: Based on refining the amplitude resolution
– Base layer uses a coarse quantizer– Enh1 applies a finer quantizer to the difference
between the original DCT coefficients and the coarsely quantized base layer coefficients
I frame P-frame
EI frame EP frame
Note: Base & enhancement layers are at the same spatial resolution
![Page 43: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/43.jpg)
John G. ApostolopoulosPage 43
VideoCoding
April 22, 2004
Summary of Scalable Video Coding
• Three basic types of scalable video coding:– Temporal scalability– Spatial scalability– SNR (quality) scalability
• Scalable coding produces different layers with prioritized importance
• Prioritized importance is key for a variety of applications:– Adapting to different bandwidths, or client resources
such as spatial or temporal resolution or computational power
– Facilitates error-resilience by explicitly identifying most important and less important bits
![Page 44: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/44.jpg)
John G. ApostolopoulosPage 44
VideoCoding
April 22, 2004
Outline of Today’s Lecture
• Motivation for compression• Brief review of generic compression system (from prior lecture)• Brief review of image compression (from last lecture)• Video compression
– Exploit temporal dimension of video signal– Motion-compensated prediction– Generic (MPEG-type) video coder architecture– Scalable video coding
• Overview of current video compression standards– What do the standards specify?– Frame-based video coding: MPEG-1/2/4, H.261/3/4– Object-based video coding: MPEG-4
![Page 45: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/45.jpg)
John G. ApostolopoulosPage 45
VideoCoding
April 22, 2004
Motivation for Standards
• Goal of standards: – Ensuring interoperability: Enabling communication
between devices made by different manufacturers– Promoting a technology or industry– Reducing costs
![Page 46: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/46.jpg)
John G. ApostolopoulosPage 46
VideoCoding
April 22, 2004
What do the Standards Specify?
Encoder Bitstream Decoder
![Page 47: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/47.jpg)
John G. ApostolopoulosPage 47
VideoCoding
April 22, 2004
What do the Standards Specify?
• Not the encoder• Not the decoder• Just the bitstream syntax and the decoding process (e.g. use IDCT,
but not how to implement the IDCT)→ Enables improved encoding & decoding strategies to be
employed in a standard-compatible manner
Encoder Bitstream Decoder
Scope of Standardization
(Decoding Process)
![Page 48: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/48.jpg)
John G. ApostolopoulosPage 48
VideoCoding
April 22, 2004
Current Image and VideoCompression Standards
Standard Application Bit Rate
JPEG Continuous-tone still-image compression
Variable
H.261 Video telephony and teleconferencing over ISDN
p x 64 kb/s
MPEG-1 Video on digital storage media (CD-ROM)
1.5 Mb/s
MPEG-2 Digital Television 2-20 Mb/s
H.263 Video telephony over PSTN 33.6-? kb/s MPEG-4 Object-based coding, synthetic
content, interactivity Variable
JPEG-2000 Improved still image compression Variable
H.264 / MPEG-4 AVC
Improved video compression 10’s to 100’s kb/s
![Page 49: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/49.jpg)
John G. ApostolopoulosPage 49
VideoCoding
April 22, 2004
Comparing Current Video Compression Standards
• Based on the same fundamental building blocks– Motion-compensated prediction (I, P, and B frames)– 2-D Discrete Cosine Transform (DCT)– Color space conversion – Scalar quantization, runlengths, Huffman coding
• Additional tools added for different applications:– Progressive or interlaced video– Improved compression, error resilience, scalability, etc.
• MPEG-1/2/4, H.261/3/4: Frame-based coding• MPEG-4: Object-based coding and Synthetic video
![Page 50: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/50.jpg)
John G. ApostolopoulosPage 50
VideoCoding
April 22, 2004
MPEG Group of Pictures (GOP) Structure
• Composed of I, P, and B frames• Arrows show prediction dependencies• Periodic I-frames enable random access into the coded bitstream• Parameters: (1) Spacing between I frames, (2) number of B frames
between I and P frames
MPEG GOP
I0 B1 B2 P3 B4 B5 P6 B7 B8 I9
![Page 51: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/51.jpg)
John G. ApostolopoulosPage 51
VideoCoding
April 22, 2004
MPEG Structure
• MPEG codes video in a hierarchy of layers. The sequence layer is not shown.
P
GOP Layer Picture Layer
MacroblockLayer
BlockLayer
8x8 DCT4 8x8 DCT
1 MV
Slice Layer
BB
PB
BI
![Page 52: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/52.jpg)
John G. ApostolopoulosPage 52
VideoCoding
April 22, 2004
MPEG-2 Profiles and Levels
• Goal: To enable more efficient implementations for different applications (interoperability points)
– Profile: Subset of the tools applicable for a family of applications
– Level: Bounds on the complexity for any profile
Simple Main HighProfile
Level
Low
Main
High
DVD & SD Digital TV:Main Profile at Main Level(MP@ML)
HDTV: Main Profile at High Level (MP@HL)
![Page 53: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/53.jpg)
John G. ApostolopoulosPage 53
VideoCoding
April 22, 2004
MPEG-4 Natural Video Coding
• Extension of MPEG-1/2-type algorithms to code arbitrarily shaped objects
[MPEG Committee]
Frame-based Coding
Object-based Coding
Basic Idea: Extend Block-DCT and Block-ME/MC-prediction to code arbitrarily shaped objects
![Page 54: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/54.jpg)
John G. ApostolopoulosPage 54
VideoCoding
April 22, 2004
Example ofMPEG-4Scene
(Object-basedCoding)
[MPEG Committee]
![Page 55: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/55.jpg)
John G. ApostolopoulosPage 55
VideoCoding
April 22, 2004
Example MPEG-4 Object Decoding Process
[MPEG Committee]
![Page 56: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/56.jpg)
John G. ApostolopoulosPage 56
VideoCoding
April 22, 2004
Sprite Coding (Background Prediction)
• Sprite: Large background image– Hypothesis: Same background exists for many frames,
changes resulting from camera motion and occlusions• One possible coding strategy:
1. Code & transmit entire sprite once2. Only transmit camera motion parameters for each
subsequent frame• Significant coding gain for some scenes
![Page 57: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/57.jpg)
John G. ApostolopoulosPage 57
VideoCoding
April 22, 2004
Sprite Coding Example
Sprite (background) ForegroundObject
ReconstructedFrame [MPEG Committee]
![Page 58: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/58.jpg)
John G. ApostolopoulosPage 58
VideoCoding
April 22, 2004
Review of Today’s Lecture
• Motivation for compression• Brief review of generic compression system (from prior lecture)• Brief review of image compression (from last lecture)• Video compression
– Exploit temporal dimension of video signal– Motion-compensated prediction– Generic (MPEG-type) video coder architecture– Scalable video coding
• Overview of current video compression standards– What do the standards specify?– Frame-based video coding: MPEG-1/2/4, H.261/3/4– Object-based video coding: MPEG-4
![Page 59: video_compression_2004](https://reader033.vdocuments.net/reader033/viewer/2022060109/5557a473d8b42a696c8b46e4/html5/thumbnails/59.jpg)
John G. ApostolopoulosPage 59
VideoCoding
April 22, 2004
References and Further Reading
General Video Compression References:• J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'',
Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999.
• V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Boston, Massachusetts: Kluwer Academic Publishers, 1997.
• J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall, MPEG Video Compression Standard, New York: Chapman & Hall, 1997.
• B.G. Haskell, A. Puri, A.N. Netravali, Digital Video: An Introduction to MPEG-2, Kluwer Academic Publishers, Boston, 1997.
MPEG web site:http://drogo.cselt.stet.it/mpeg