introduction on mpeg video coding standardvc lab, cs, nthu introduction on mpeg video coding...

<1>Introduction on MPEG Video Coding Standard VC Lab, CS, NTHU

Introduction on MPEG Video Coding Standards

Yung-Ching Chang (張永清 )

Visual Communication Laboratory,CS, NTHU


8 8 blocks

8 8 blocks

Sourceimage

Reconstructedimage

DCT Q

Quantizationtable

Zigzag scanEntropyencoder

Tablespecification

Compressedimage data

IDCT IQ

Quantizationtable

Zigzag scanEntropydecoder

Tablespecification

Compressedimage data

Lossy Coding of Still Image - JPEG


VideoType

Pixels perFrames

ImageAspectRatio

FramesperSecond

Bits/pixel UncompressedBitrate

NTSC 480 483 4:3 29.97 16 111.2 Mb/s

PAL 576 576 4:3 25 16 132.7 Mb/s

CIF 352 288 4:3 14.98 12 18.2 Mb/s

QCIF 176 144 4:3 9.99 12 3.0 Mb/s

HDTV 1280 720 16:9 59.94 12 622.9 Mb/s

HDTV 1920 1080 16:9 29.97 12 754.7 Mb/s

Uncompressed Bitrate for Video


16 16 Macroblocks

Frame n

Frame n-1

Motion vector Search range

Residual

Y

Cb

Cr

DCT, Quantization,Zigzag, Entropy coding

Motion Compensated Predictive Coding


Video Compression Standards• CCITT H.261

– ITU-T Study Group 15

– Videophone and video conferencing

– 1988-1990: p x 64 kbps (p = 1… 30)

• ITU-T H.263– PSTN and mobil network: 10 to 24 kbps

– 1994: H.263, H.263+…


Video Compression Standards (cont’d)• MPEG-1 Video (ISO/IEC 11172-2)

– 1.2 ~ 1.5Mbps

– Video for digital storage media, CD-ROM

– Sep 1990

• MPEG-2 Video (ISO/IEC 13818-2)– 2 ~ 30 Mbps

– Digital broadcast TV, HDTV, Video services on network

– Nov 1993

• MPEG-4 (ISO/IEC 14496)– An emerging coding standard

– Universal access


MPEG-1 v.s. H.261 (Conceptually)• H.261

– Short algorithm delay

– Lower compression complexity

– Lower memory requirement

– Limited flexibility on bit rate control

• MPEG-1– Longer algorithm delay

– Higher compression complexity

– Higher memory requirement

– More coding mode support higher bit rate flexibility


Algorithm Delay• H.261

• MPEG-1

I P P P P I

. . . . .

P

I P

. . . . .

B B PB B PB B I PB B

Group of pictures (GOP)

B-picture can’t be coded until next P- or I-frame


Compression Complexity• H.261

• MPEG-1

I P P P P I

. . . . .

P

I P

. . . . .




Memory Requirement• H.261

• MPEG-1

I P P P P I

. . . . .

P

I P

. . . . .




Bit Rate Flexibility• H.261

• MPEG-1

– GOP structure and B-frame can offer more flexibility on coding bit rate

I P P P P I

. . . . .

P

I P

. . . . .




MPEG-1 v.s. H.261 (Technically)• MPEG-1

– Bi-directional motion compensation (B-picture)

– Group of pictures (GOP)

– Half-pel motion compensation

– Visually weighted quantization

– No picture size or bit rate constraints

– Flexible slice structure instead of GOB


MPEG-1 Coding Hierarchy

. . . . .

. . . . .

Video sequence

I B B P B B P B B P B B I B B P …


Divided into GOPs

Motion estimation


MPEG-1 Coding Hierarchy (cont’d)

Slice12

Slice3Slice1

Slice2Slice4 Slice5 Slice6Slice7 Slice8 Slice9Slice10 Slice11

Slice13

Picture Slices

Slice Macroblocks

16 16

Y Cb Cr

8 8


Some Coding Schemes• GOP

– Random access

– Prevent error propagation

• B-picture– Pros: Best prediction and compression, object occlusion and

entrance into scene, noise averaging.

– Cons: Encoder delay, high complexity, large encoder buffer required

• Slice– Synchronous unit

– Suit for localized image property


Group of Pictures• Group of pictures (GOP)

– A GOP contains at least one I-picture

– Must start by I-picture in bitstream order

– Can have any number of P-picture and B-picture

1I 2B 3B 4P 5B 6B 7P 8B 9B 10I 11B 12B 13P 14B 15B 16P 17B 18B 19I …

1I 4P 2B 3B 7P 5B 6B 10I 8B 9B 13P 11B 12B 16P 14B 15B 19I 17B 18B …

1I 2B 3B 4P 5B 6B 7P 8B 9B 10I 11B 12B 13P 14B 15B 16P 17B 18B 19I …

Display order:

Bitstream order:

Display order:


Group of Pictures (cont’d)• Closed GOP

– Don’t reference to the pictures in the previous GOP

– Can be easily removed while editing

• Open GOP: Reference to previous GOP

I B B P B B P B B P

B B I B B P B B P B B P

B B I B B P B B P B B P

Closed GOP:

Open GOP:

Closed GOP:

Reference to the previous P or I

Only reference to the next I

Display order


System Stream Layer• An MPEG stream is segmented into packs

– Contain info about system clock, bit rate, number of video streams and audio streams.

– Multiplexing of video streams and audio streams– Can contain multiple packets, ex. three packets for video

stream 1 and video stream 2 and audio stream 1.

• Packet– Each packet contain a segment of data from a video stream or

audio stream– Has presentation time and/or decoding time– Combine the payload of contiguous packets to form a

elementary stream


Coding MPEG Video• Rate control within a sequence

– Allocate bit rate for each picture

– A reasonable ratio, I:P:B = 8:5:1

– Give the I and P the same visual quality, and reduce the bit rate for B to save bits, because B is not referenced, lower quality will not propagate

– If there is little motion or change, the I should get more bits; if there is a lot of motion or change, reduce the bits of I and give them to P

– Video stream of VCD: 1394.4 kbps, contain 30 pictures, typical GOP is IBBPBBPBBPBBPBB or IBBBPBBBPBBBPBBBP


Rate Control within a Picture• Allocate the target bits for each macroblock• If the generated bits over the target bits

– Increase the quantizer scale

– Discard the high frequency of DCT coefficients

• If the generated bits is lower than the target bits– Decrease the quantizer scale

– Insert the macroblock stuffing bits

• How to allocate bits?– Smaller quantizer scale for smooth area to avoid blocking

effect

– Higher quantizer scale for rough area to save bits


Slice selection• Each slice header require 40 bits

– For a video (30 picture/s) with vertical resolution is 240, there are 15 slices if each row of macroblocks is a slice.

– If a picture contains only one slice 1200bps for the slices

– If a picture contains 15 slices 18000bps for the slices

• A slice is the minimum independently decodable unit• For an error free environment, one slice per picture may

be appropriate• If the environment is noisy, the one slice per row of

macroblocks may be more desirable• A slice have a quantizer scale, ranged from 1 to 31


Motion Estimation• The estimation distance is more longer than the H.261 1024 for full pixel or 512 for half pixel• Full search is not suitable and require a faster search

algorithm

I B B P


Coding I-Pictures• Macroblock types in I-picture

– intra-d: encode in intra-mode with default quantization

– intra-q: encode in intra-mode with updated quantization

– Each intra-q require extra 5 bits for quantizer scale, ranged from 1 to 31

• A macroblock divided into for luminance blocks and two chrominance blocks, all six blocks have to be DCT coded

TYPE QUANT VLC

intra-d 0 1

intra-q 1 01


Coding blocks in I-Pictures• Applying DCT to each blocks as defined in H.261

– Quantize coefficients by the uniform quantizer for I-pictures

– The final quantizer scale for DC is always 8

– The final quantizer scale for each AC is the the corresponding value in the quantization matrix multiple the quantizer scale of this macroblock

Coef.

Index 8 16 19 22 26 27 29 34

16 16 22 24 27 29 34 37

19 22 26 27 29 34 34 38

22 22 26 27 29 34 37 40

22 26 27 29 32 35 40 48

26 27 29 32 35 40 48 58

26 27 29 34 38 46 56 69

27 29 35 38 46 56 69 83


1 0 0 0 0 0 0 0

2 -3 0 0 0 0 0 0

4 -5 0 0 0 0 0 0

1 0 0 130 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Coding blocks in I-Pictures (cont’d)• The quantized DC is DPCM + entropy coded• The quantized ACs are zig-zag scanned and then entropy

coded• Example:

RUN VALUE CODE COMMENT

1 2 0001 100

0 4 0000 1100

0 -3 0010 11

3 -5 0000 0100 0011 1111 1011 RUN+VALUE

0 1 110

14 130 0000 0100 1110 0000 0000 1000 0010 RUN+VALUE

EOB 10


Coding P-Pictures• Seven macroblock types in P-pictures

– -m: motion compensation, require motion vector

– -c: coding pattern to indicate which blocks to be DCT coded

– -q: change quantizer scale

– skipped: use motion vector of previous macroblock

TYPE VLC INTRA MOTION FORWARD

CODED PATTERN

QUANT

pred-mc 1 1 1

pred-c 01 1

pred-m 001 1

intra-d 0001 1 1

pred-mcq 0001 0 1 1 1

pred-cq 0000 1 1 1

intra-q 0000 01

1 1

skipped


Coding P-Pictures (cont’d)• Coded block pattern (CBP)

– Indicate which blocks to be DCT coded

– If all quantized coefficients in one block are zero, this block is not coded; if all blocks are not coded, skip this macroblock

• Selection of macroblock typeCBP = 32 * BY0 + 16 * BY1 + 8 * BY2 + 4 * BY3 + 2 * BCb + BCr

Pred-mcqPred-mcPred-mPred-cqPred-cSkippedIntra-qIntra-d

QuantNot quant

QuantNot quant

QuantNot quant

Coded

Not coded

Coded

Not coded

Begin

MC

No MC

Non-Intra

Intra


Coding blocks in P-Pictures• Intra blocks are coded as I-picture• Inter blocks

– The residual is applying DCT

– Quantize coefficients by the dead zone quantizer

– The final quantizer scale for each AC is the the corresponding value in the quantization matrix multiple the quantizer scale of this macroblock

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

16 16 16 16 16 16 16 16

Coef.

Index


Coding B-Pictures• Eleven macroblock types in B-pictures

– -I: interpolation, -c: coding pattern, -f: forward, -b: backward, -q: quantization

TYPE VLC INTRA MOTION FORWARD

MOTION BACKWARD

CODED PATTERN

QUANT

pred-i 10 1 1

pred-ic 11 1 1 1

pred-b 010 1

pred-bc 011 1 1

pred-f 0010 1

pred-fc 0011 1 1

intra-d 0001 1 1

pred-icq 0001 0 1 1 1 1

pred-fcq 0000 11

1 1 1

pred-bcq 0000 10

1 1 1

intra-q 0000 01

1 1

skipped


Coding B-Pictures (cont’d)• Selection of macroblock type

– Because B-pictures have lowest bit rate, try to select the skipped type at first

– Do the forward motion estimation and backward estimation, and then do interpolation find the best one

Pred-*cqPred-*cPred-* or skippedIntra-qIntra-d

QuantNot quant

QuantNot quant

Coded

Not codedNon-Intra

Intra

A

AAA

Begin


Decoding a Sequence for VCR Command• Decoding for fast forward

– Discard the B-pictures and decode only the I- and P-

– Discard the P- and B-pictures and decode only the I-

• Decoding for reverse play– Require a large buffer to store whole bitstream of a GOP, and

then decode and display at a reverse order

B B I B B P B B P B B P pictures in display order0 1 2 3 4 5 6 7 8 9 10 11

I B B P B B P B B P B B pictures in decoding order2 0 1 5 3 4 8 6 7 11 9 10

I P P P B B B B B B B B pictures in new order2 5 8 11 10 9 7 6 4 3 1 0


Pre- and Post-Processing• Pre-processing

– Apply medium filter to remove noise

– Apply low-pass filter to smoothing the image edge, remove the high frequency to prevent the ringing effect

• Post-processing– Blocking artifacts are more visible

in the low frequency blocks

– Low-pass filter at block boundaries

– Wide low-pass filter at adjacent smooth blocks

C i j KDCT low( , ) * 0

Klow

0 0 1 1 1 1 1 1

0 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

C1

C1

C2

C3 C4

C5

C1 = 0.75, C2 = 0.25

C3 = 0.5, C4 = 0.25, C5 = 0

C4C2


Pre- and Post-Processing (cont’d)– Ringing artifact appears along the

sharp edges, in other words, in thehigh frequency blocks

– Detect the edges in ringing block bythe Sobel masks, mark as edge if overa threshold

– Apply a simple low-pass filter on thenon-edge area

C i j KDCT high( , ) * 0

Khigh

0 0 0 1 1 1 1 1

0 0 0 1 1 1 1 1

0 0 0 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1H1

1 0 1

2 0 2

1 0 1

H2

1 2 1

0 0 0

1 2 1

1/12 1/12 1/12

1/12 4/12 1/12

1/12 1/12 1/12


MPEG-2 Compared to MPEG-1• Frame/Field adaptive motion compensation and DCT• Dual prime motion compensation (for P-pictures when

no B-pictures)• Nonlinear quantization table with increased accuracy for

small values• Alternate scan for DCT coefficients• New VLC tables for DCT coefficients coding• In addition to 4:2:0, also supports 4:2:2 and 4:4:4• Support maximum motion vector range of -2048 to

+2047.5 (always half-pixel motion vectors)


Frame/field DCT• Frame DCT

• Field DCT


Nonlinear Quantization Tablequantizer_scale_code q_scale_type = 0 q_scale_type = 1

1 2 1

2 4 2

3 6 3

4 8 4

5 10 5

…

15 30 22

16 32 24

17 34 28

18 36 32

…

28 56 88

29 58 96

30 60 104

31 62 112


Additional Chrominance Format• 4:2:0

• 4:2:2

• 4:4:4

Y Cb Cr


Alternate Ccan for DCT Coefficients

0 1 5 6 14 15 27 28

2 4 7 13 16 26 29 42

3 8 12 17 25 30 41 43

9 11 18 24 31 40 44 53

10 19 23 32 39 45 52 54

20 22 33 38 46 51 55 60

21 34 37 47 50 56 59 61

35 36 48 49 57 58 62 63

0 4 6 20 22 36 38 52

1 5 7 21 23 37 39 53

2 8 19 24 34 40 50 54

3 9 18 25 35 41 51 55

10 17 26 30 42 46 56 60

11 16 27 31 43 47 57 61

12 15 28 32 44 48 58 62

13 14 29 33 45 49 59 63


Major Components of an MPEG-4 Terminal


MPEG-4 Components

Face• 66 Facial animation parameters• Primary facial expressions• 14 Visemes

2D Mesh• Triangular patches• Motion vector

Texture• From VOP• Still texture (Discrete Wavelet Transform)

VO (Video Object)• Shape• Motion vectors• Texture

AO (Audio Object)• MPEG Layer 1-3• AAC(Advanced Audio Coder)• TTS (Text-To-Speech)


Content-based Audio-Visual Representation• Audio-Visual Object (AVO)• Video object component (video object plane, VOP)

– natural or synthetic

– 2D or 3D

• Audio object component– mono, stereo or multichannel


Video Object Planes (VOP)• Characteristics of VOP

– may have different spatial temporal resolutions

– may be associated with different degrees of accessibility sub-VOPs

– may be separated or overlapping

• VOP type– Traditional I, P, B type

– S-VOP (Sprite) for background


Video Object Plane Type

I-VOP

B-VOP

B-VOP

P-VOP

B-VOP

B-VOP

P-VOP

B-VOP

B-VOP

S-VOP

S-VOPTime


Content-based Object Manipulation• Object manipulation

– change of the spatial position of a VOP

– application of a spatial scaling factor to a VOP

– change of the speed with which an VOP moves

– insertion of new VOPs

– deletion of an object in the scene

– change of the scene area


Example of Bit stream Manipulation


Segmentation Process• Depending on applications, segmentation can be perform

– Online (real-time) or offline (non-real-time)

– Automatic or semi-automatic

• Examples– Video conferencing

• real-time, automatic

• separate foreground (communication partner) from background

– Object Tracking in Video

• May allow off-line and semi-automatic

• separate moving object from others


Compression• Improved coding efficiency

– 5-64 kbps for mobile applications

– up to 20Mbps for TV/film applications

– subjectively better quality compared to existing standard

• Coding of multiple concurrent data streams– can code multiple views of a scene efficiently,

e.g. stereo video


Coding VO in MPEG-4• Reduce temporal redundancy• Motion estimation for

arbitrary shaped VOPs– padding and modified block

(polygon) matching motion estimation

I-VOP

B-VOP

P-VOP

time


Coding Procedure of VOP

BAB (Binary Alpha Block)• Motion Vector• CAE (Context-Based Arithmetic Encoding)• Rate Control by Sub-sampling

Texture• Motion Vector• DCT• Rate Control by Quantization Step


New Coding Features• For each macroblock, the motion vectors can be

computed on a 16 16 or 8 8 block basis• Unrestricted motion estimation: prediction can extend

over image boundary• Overlapped block motion compensation• Each component of texture can range from 1 to 12 bits• More robust coding


Robust Video Coding• Resynchronization

– Allow insertion of resync marker with each VOP

– Video packet header: include macroblock number, qunatizer value and timing information

• Data partition– Allow shape, motion and texture data to be separated within a

packet

• Reversible VLC– Offer partial recovery from errors.


Sprite VOP• Represent background image• Can be used for very efficient coding of scenes involving

camera pan and zoom• Much larger than the size of image and thus require more

memory


Example of Sprite VOP


Object Mesh• Useful for animation, content manipulation, content

overlay, merging natural and synthetic video and others• Tasselate with triangular patches• Define motion vector for each node

– 2D motion of video objects are represented by the motion vectors of the node points

– Motion compensation is achieved by warping of texture map corresponding to patches by affine transform


Example of Object Mesh


Face Animation• Face model

– Default face model

– Download from the encoder

• Low-level facial animation– A set of 66 facial animation parameters

• High-level facial animation– A set of primary facial expression like joy, sadness, surprise

and disgust

• Speech animation– 14 visemes for mouth shape

– Text-to-speech synthesizer


Still Texture Coding• Discrete Wavelet Transform (DWT)

– Spatial and quality scalability

• Use 2D Daubechies (9, 3)-tap biorthogonal filter• Lowest band is lossless coded by arithmetic coding• Higher bands are coded by multilevel quantization, zero-

tree scanning and arithmetic coding


Toolbox Approach

TOOLS

ALGORITHMS

PROFILES

tools for natural scenes

tools for synthetic scenes


Audio Coding• Different bit-rates, different types of source material and

different algorithms• Combination of parameter based coding, LPC-based

coding, time/frequency based coding• High quality speech with 2 kbps: Harmonic Vector

eXcitation Coding (HVXC)• Text-to-Speech (TTS)


VOP-Based Retrieval

Find video sequence contain a car moving from left to right

MPEG-4 Video Object Database Video Index

VOP Object

Shape

Texture

Motion

Matching VideoDatabase

Retrieved Video


Entropy Coding• Self information: –log2 p

• Huffman coding for pdf(a1, a2, a3) = (0.5, 0.25, 0.25)

– –log2 0.5 = 1, –log2 0.25 = 2

– a1 = 0, a2 = 10, a3 = 11

• If the self information is not integer?– pdf(a1, a2, a3, a4) = (0.6, 0.2, 0.125, 0.075)

– –log2 0.6 = 0.737, –log2 0.2 = 2.32,–log2 0.125 = 3, –log2 0.075 = 3.74

– a1 = 0, a2 = 10, a3 = 110, a4 = 111

a1 = 0.5

a2 = 0.25

a3 = 0.25

0

10

1

a1 = 0.6

a2 = 0.2

a3 = 0.125

a4 = 0.075

0

1

0

1

0

1


Arithmetic Coding• The bits allocated for each symbol can be non-integer

– Ex. If pdf(a) = 0.6, then the bits to encode ‘a’ is 0.737

• For the optimal pdf, the coding efficiency is always better than or equal to the Huffman coding

• Huffman coding for a2 a1 a4 a1 a1 a3, total 11 bits:

• Arithmetic coding for a2 a1 a4 a1 a1 a3, total 11.271 bits:

2 + 1 + 3 + 1 + 1 + 3

2.32 +0.737+ 3.74 +0.737+0.737+ 3


Conditional Probability• Why the self information is –log2 p ?

• Consider the conditional probability for the case of p 0.5– Require one bit for each condition to double the probability

• Consider the conditional probability for the case of p > 0.5

Probability of ‘a’ is 0.125

Probability of ‘a’ is 0.25 of the right hand part

Probability of ‘a’ is 0.5 of the right hand part of the right hand part

Probability of ‘a’ is 1 of the right hand part of the right hand part

of the right hand part

Probability of ‘a’ is 0.7

Probability of ‘aa’ is 0.49

Probability of ‘aa’ is 0.98 of the right hand part

Probability of ‘aaa’ is 0.686 of the right hand side


Rescaling & Incremental Coding


Incremental Encoding


Incremental Decoding


Incremental Coding


Incremental Coding (cont’d)


Incremental Encoding


Incremental Decoding


Integer Arithmetic

introduction on mpeg video coding standardvc lab, cs, nthu introduction on mpeg video coding...

Documents