15. 15. h.264/avch.264/avccwlin/courses/... · 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8) – integer,...

1

Page 1

15. 15. H.264/AVCH.264/AVC

Prof. ChiaProf. Chia--Wen Lin (Wen Lin (林嘉文林嘉文))o C ao C a e (e (林嘉文林嘉文))Department of Department of Electrical Engineering Electrical Engineering

National National TsingTsing HuaHua UniversityUniversity0303--57311525731152

[email protected]@ee.nthu.edu.tw

MPEGMPEG--4 Parts4 Parts

Part I: Part I: SystemsSystemsPart II: Part II: VisualVisualPart III: Part III: AudioAudioPart IV: Part IV: ConformanceConformancePart V: Part V: Reference softwareReference softwarePart VI: Part VI: DMIF (Delivery Multimedia Integration Framework)DMIF (Delivery Multimedia Integration Framework)Part VII: Part VII: Optimized software for MPEGOptimized software for MPEG--4 tools4 toolsPart VIII: Part VIII: MPEGMPEG--4 on IP framework4 on IP frameworkPart IX: Part IX: Reference hardware descriptionReference hardware descriptionPart X: Part X: Advanced Video Coding (AVC)Advanced Video Coding (AVC)

2

Page 2

MPEGMPEG--4 Parts4 Parts

Visual Visual –– Part 2 (ISO/IEC 14496Part 2 (ISO/IEC 14496--2)2)–– VideoVideo

C di f t l idC di f t l id•• Coding of natural videoCoding of natural video–– SNHC (SyntheticSNHC (Synthetic--Natural Hybrid Coding)Natural Hybrid Coding)

•• Facial & Body animationFacial & Body animation•• Graphic codingGraphic coding

–– Texture codingTexture coding–– Sprite codingSprite coding

Vi lVi l P t 10 (ISO/IEC 14496P t 10 (ISO/IEC 14496 10)10)Visual Visual –– Part 10 (ISO/IEC 14496Part 10 (ISO/IEC 14496--10)10)–– AVC (Advanced Video Coding)AVC (Advanced Video Coding)–– JVT (Joint Video Team), ISO+ITUJVT (Joint Video Team), ISO+ITU--TT–– Focused solely on coding of natural videoFocused solely on coding of natural video–– Very high coding efficiencyVery high coding efficiency

MPEGMPEG--4 AVC4 AVC

Working Draft 2 Working Draft 2 -- January 2002January 2002Committee Draft (CD) Committee Draft (CD) –– May 2002May 2002Final CD Final CD –– July 2002July 2002FDIS (Final Draft International Standard) FDIS (Final Draft International Standard) ––December 2002December 2002

3

Page 3

Video Coding StandardsVideo Coding Standards

MPEGMPEG--22–– State of the art 1994State of the art 1994–– State of the art, 1994State of the art, 1994

MPEGMPEG--4 Video, Part 24 Video, Part 2–– ASP (Advanced Simple Profile)ASP (Advanced Simple Profile)–– State of the art, 1999State of the art, 1999–– ~ 1.5 coding gain over MPEG~ 1.5 coding gain over MPEG--2 (on average)2 (on average)

MPEGMPEG 4 AVC P t 104 AVC P t 10MPEGMPEG--4 AVC, Part 104 AVC, Part 10–– State of the art, 2002State of the art, 2002–– ~ 2x coding gain over MPEG~ 2x coding gain over MPEG--2 (on average)2 (on average)–– Final Draft Standard in Dec 2002Final Draft Standard in Dec 2002

The Design Goals of MPEGThe Design Goals of MPEG--4 AVC4 AVC

•• High compression efficiencyHigh compression efficiency•• Flexible application to delay constraintsFlexible application to delay constraintsFlexible application to delay constraints Flexible application to delay constraints

appropriate to a variety of servicesappropriate to a variety of services•• Error resilience capabilityError resilience capability•• Complexity scalabilityComplexity scalability•• Full specification of decoding (no mismatch)Full specification of decoding (no mismatch)•• High quality applicationHigh quality application•• Network friendlinessNetwork friendliness

4

Page 4

ApplicationsApplications

•• Conversational services for video telephony and Conversational services for video telephony and video conferencingvideo conferencing

•• Live or preLive or pre--coded video streaming servicescoded video streaming services•• Video in multimedia messaging services (MMS) Video in multimedia messaging services (MMS)

Video Coding HierarchyVideo Coding Hierarchy

•• Sequence, Sequence, consisting ofconsisting ofPi tPi t NAL•• Pictures, Pictures, consisting ofconsisting of

•• Slices, Slices, consisting ofconsisting of•• Macroblocks, Macroblocks, consisting ofconsisting of•• Blocks, Blocks, consisting ofconsisting of

Pixels / PelsPixels / Pels

NAL

VCL

•• Pixels / PelsPixels / Pels

Note: for interlaced video, a picture consists of either one frame or two fields

5

Page 5

VCL and NALVCL and NAL

•• H.264 consists of H.264 consists of –– Video Coding Layer (VCL) Video Coding Layer (VCL) ––

•• Perform the tasks associated with video codingPerform the tasks associated with video coding–– Network Abstraction Layer (NAL) Network Abstraction Layer (NAL) ––

•• Implement videoImplement video--specific support features for a specific support features for a variety of networksvariety of networks

•• Seamless and easy integration into all current Seamless and easy integration into all current transmission protocoltransmission protocoltransmission protocoltransmission protocol

•• Easier packetization and better information priority Easier packetization and better information priority controlcontrol

The Features of VCL (1/3)The Features of VCL (1/3)

•• TransformationTransformation–– Integer 4x4 block transform for residual codingInteger 4x4 block transform for residual coding–– HardamardHardamard

•• A 4x4 transform on the DC coefficients of the 4x4 A 4x4 transform on the DC coefficients of the 4x4 blocks in a 16x16 macroblockblocks in a 16x16 macroblock

•• A 2x2 transform for the DC coefficients of the 4x4 A 2x2 transform for the DC coefficients of the 4x4 chroma blocks in a 8x8 macroblockchroma blocks in a 8x8 macroblock

6

Page 6


•• QuantizationQuantization•• Motion EstimationMotion EstimationMotion EstimationMotion Estimation

–– Variable blockVariable block--size motion prediction (7 block sizes: size motion prediction (7 block sizes: 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8)16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8)

–– Integer, 1/2Integer, 1/2--, and 1/4, and 1/4--pixel motion vector accuracypixel motion vector accuracy–– Multiple reference frames (max. 15) may be used for Multiple reference frames (max. 15) may be used for

predictionprediction


•• Entropy coding:Entropy coding:–– ContextContext--based Adaptive Variable Length Coding based Adaptive Variable Length Coding

(CAVLC)(CAVLC)(CAVLC)(CAVLC)–– ContextContext--based Adaptive Binary Arithmetic Coding based Adaptive Binary Arithmetic Coding

(CABAC)(CABAC)

•• Others:Others:–– SpaceSpace--domain Intra prediction (10 prediction modes)domain Intra prediction (10 prediction modes)–– DeDe--blocking loop filterblocking loop filterDeDe blocking loop filterblocking loop filter–– Motion vector predictionMotion vector prediction–– Slice structureSlice structure–– Interlace coding toolsInterlace coding tools

7

Page 7

Frame TypesFrame Types

•• II--frameframe•• PP--frameframePP frameframe•• BB--frameframe•• SPSP-- and SIand SI--frameframe

–– SP and SI frames provide functionalities for bitSP and SI frames provide functionalities for bit--stream switching, splicing, random access, VCR stream switching, splicing, random access, VCR functionalities, and error resilience/recoveryfunctionalities, and error resilience/recoveryfunctionalities, and error resilience/recoveryfunctionalities, and error resilience/recovery

Picture FormatsPicture Formats

•• Color sequences using 4:2:0 chroma subColor sequences using 4:2:0 chroma sub--samplingsampling

...TopField

BottomField

TopField

interlaced framesprogressive frames

...

= Location of luminance sample= Location of chrominance sample

Guide:Time

= Luminance Sample

= Chrominance Sample

8

Page 8

Macroblock SubdivisionMacroblock Subdivision

•• Each Picture is divided into 16x16 macroblocks.Each Picture is divided into 16x16 macroblocks.•• The order of the macroblocks in the bitstream depends The order of the macroblocks in the bitstream depends

on the Macroblock Allocation Map and is noton the Macroblock Allocation Map and is noton the Macroblock Allocation Map and is not on the Macroblock Allocation Map and is not necessarily raster scan ordernecessarily raster scan order

0 1 2 3 4 5 6 0 1 2 3 40 1 2 3 4 5 6

7 8 9

0 1 2 3 4

5 6 7 8 9

MPEGMPEG--4 AVC/H.264: Encoder 4 AVC/H.264: Encoder ArchitectureArchitecture

ControlData

CoderControl

T f /

InputVideoSignal

EntropyCoding

Scaling & Inv. Transform

Quant.Transf. coeffs

Decoder

Transform/Scal./Quant.-

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

Motion-Compensation

MotionData

Intra/Inter

MotionEstimation

PredictionOutputVideoSignal

9

Page 9

MPEGMPEG--4 AVC/H.264: Motion 4 AVC/H.264: Motion CompensationCompensation

ControlData

Q t

CoderControl

Transform/

InputVideoSignal

EntropyCoding



Decoder

Scal./Quant.-Split into

Macroblocks16x16 pixels


De-blockingFilter

Output0

16x16

0 1

8x16MB

Types

8x80 12 3

16x8

1

0

Motion-Compensation

MotionData

Intra/Inter

MotionEstimation

OutputVideoSignal

Motion vector accuracy 1/4 (6-tap filter)

8x8

0

4x8

0 10 12 3

4x48x4

108x8

Types

2 31

Variable Variable BlockBlock--Size CodingSize Coding

10

Page 10

Motion CompensationMotion Compensation

•• Various block sizes and shapes for motion Various block sizes and shapes for motion compensation compensation

•• 1/4 sample accuracy (sort of per MPEG1/4 sample accuracy (sort of per MPEG--4, Pt. 2 V.2)4, Pt. 2 V.2)–– 6 tap filtering to 1/2 sample accuracy6 tap filtering to 1/2 sample accuracy–– simplified filtering to 1/4 sample accuracysimplified filtering to 1/4 sample accuracy–– special position with heavier filteringspecial position with heavier filtering

•• Multiple reference pictures (per H.263++ Annex U)Multiple reference pictures (per H.263++ Annex U)•• TemporallyTemporally--reversed motion and generalized Breversed motion and generalized B--

framesframes•• BB--frame prediction weightingframe prediction weighting

Block Modes of P PicturesBlock Modes of P Pictures

•• MacroblockMacroblock: 16x16: 16x16•• 7 motion prediction modes7 motion prediction modes

–– 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x416x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4–– Motion vectors accuracy: integer, ½Motion vectors accuracy: integer, ½--, and ¼, and ¼--pixelpixel

0 0 1

1

0 0 1

2 3

Mode 1 Mode 2 Mode 3 Mode 4

0 1 2 3

4 5 6 7

0 12 34 56 7

0 1 2 34 5 6 78 9 10 11

12 13 14 15

Mode 5 Mode 6 Mode 7

11

Page 11

Motion Vector SearchMotion Vector Search

•• Motion EstimationMotion Estimation–– Integer pixel searchInteger pixel search

F ti l i l h (1/2F ti l i l h (1/2 d 1/4d 1/4 i l)i l)–– Fractional pixel search (1/2Fractional pixel search (1/2-- and 1/4and 1/4--pixel)pixel)–– Reference frames selection from multiple reference Reference frames selection from multiple reference

frames (max. 15 frames)frames (max. 15 frames)–– Search range: Search range:

•• horizontal [horizontal [--2048, 2047.75] (max) 2048, 2047.75] (max) •• vertical [vertical [--512, 511.75] (max) 512, 511.75] (max) e t ca [e t ca [ 5 , 5 5] ( a )5 , 5 5] ( a )

Motion EstimationMotion Estimation

•• Motion vector predictionMotion vector prediction–– In same sliceIn same slice–– Median prediction (except 16x8 and 8x16 blocks)Median prediction (except 16x8 and 8x16 blocks)p ( p )p ( p )

A, B, C, D and E may come from different reference pictures

V1 = median{VA,VB,VC,VD }

1. C is not available, VC = VD2. B,C, and D are not available, VB = VD = VD = VA3. Any predictor is not either of above two rules, its MV is 0

12

Page 12


•• Motion vector prediction for 16x8 and 8x16 blocksMotion vector prediction for 16x8 and 8x16 blocks–– Directional segmentation predictionDirectional segmentation prediction

8x16 16x8


•• Integer pixel searchInteger pixel search–– search positions are organised in a “search positions are organised in a “spiralspiral”” structure structure

around the predicted vectoraround the predicted vectoraround the predicted vectoraround the predicted vector

. . . . . .. 15 9 11 13 16. 17 3 1 4 18. 19 5 0 6 20. 21 7 2 8 228. 23 10 12 14 24

13

Page 13


•• full fractionalfull fractional--ppixixel searchel search

V1D1 D2

ppixixel searchel search((½½-- and ¼and ¼--pixel)pixel)

a b c

d e

f g h

I II III

IV V

VI VII VIII

CH1 H2

V2D3 D4

Capital letters (C,H1,H2…) : integer pixel positionsRoma numbers (I,II,III...): 1/2-pel positionsLower case letters(a,b,c...):1/4-pel positions


•• Fractional pixel searchFractional pixel search–– Check the eight 1/2Check the eight 1/2--pel candidates, I ~ VIII around pel candidates, I ~ VIII around

the best integerthe best integer pelpel C;C; decide the best 1/2decide the best 1/2 pelpel VVthe best integerthe best integer--pel pel C;C; decide the best 1/2decide the best 1/2--pel pel VVsubject to the minimal cost among the 1/2 subject to the minimal cost among the 1/2 --pel pel candidatescandidates

–– Check the eight 1/4Check the eight 1/4--pel candidates, a ~ h around pel candidates, a ~ h around the best 1/2the best 1/2--pel pel V, V, decide the best 1/4decide the best 1/4--pel pel hh subject subject to the minimal cost among the 1/4to the minimal cost among the 1/4--pel candidatespel candidates

–– Select the motion vector and blockSelect the motion vector and block--size pattern,size pattern,Select the motion vector and blockSelect the motion vector and block size pattern, size pattern, which produces the lowest costwhich produces the lowest cost

14

Page 14

Fractional Pel Value Interpolation: Fractional Pel Value Interpolation: LumaLuma

•• Calculate HalfCalculate Half--PelPel valuesvalues–– use 6use 6--tap filter {1, tap filter {1, --5, 20, 20, 5, 20, 20, --

5, 1} to get b5, 1} to get b5, 1} to get b5, 1} to get b–– bbhh= clip(((b+16)>>5))= clip(((b+16)>>5))–– c from b values using the 6 tap c from b values using the 6 tap

filterfilter–– ccmm= clip(((c+512)>>10))= clip(((c+512)>>10))

•• Average of integer and halfAverage of integer and half--pelpell t fi dl t fi d d fd fvalues to find values to find d,e,f,gd,e,f,g

–– e.g. d = (e.g. d = (A+bA+bhh)>>1)>>1•• h = h = ((bbhh++bbvv)>>)>>1 (diagonal direction 1 (diagonal direction

averaging)averaging)•• ii = (A1+A2+A3+A4+2)>>2= (A1+A2+A3+A4+2)>>2

Fractional Pel Value Interpolation: Fractional Pel Value Interpolation: ChromaChroma

•• dydy are the fractional position in are the fractional position in units of one eighth samplesunits of one eighth samples

•• A, B, C, and D are integer pixelsA, B, C, and D are integer pixels A B

dxdy

8-dx

8-dy

C D

22 /)2/)(8)())((( 88DddCddBd8dAd8d8v yxyxyxyx ++−+−+−−=

15

Page 15

BB--PicturesPictures

•• Advantages:Advantages:–– Improve coding efficiencyImprove coding efficiency–– Provide temporal scalabilityProvide temporal scalability

•• 5 modes:5 modes:–– Direct Mode: derived forward and backward MVs, none transmittedDirect Mode: derived forward and backward MVs, none transmitted–– Forward Mode: prediction from a previous reference frameForward Mode: prediction from a previous reference frame–– Backward Mode: prediction from a subsequent reference frameBackward Mode: prediction from a subsequent reference frame–– BiBi--directional Mode: separate forward and backward MVsdirectional Mode: separate forward and backward MVs–– Intra Prediction ModeIntra Prediction Mode

•• MVs in Direct Mode:MVs in Direct Mode: P/I PBMVs in Direct Mode:MVs in Direct Mode:–– MVF = (TRB * MV)/TRDMVF = (TRB * MV)/TRD–– MVB = (TRB MVB = (TRB -- TRD) * MV/TRDTRD) * MV/TRD MV

MVF

MVB

TimeTRD

TRB

BB--PicturesPictures•• Direct ModeDirect Mode

–– No MV data is transmittedNo MV data is transmittedSame block structure as coSame block structure as co--located MB in temporallylocated MB in temporally–– Same block structure as coSame block structure as co--located MB in temporally located MB in temporally subsequent picturesubsequent picture

–– MVs are computed as scaled version of corresponding MVs are computed as scaled version of corresponding MV of the coMV of the co--located MBlocated MB

I0 B 1 B 2 B 3 P 4 B 5 B 6 B 7 P 8

16

Page 16

BB--PicturesPictures

f0 f1f1f0f1f0

List 1 ReferenceList 0 Reference Current B

MVMVF

MVB

............

current block co-located block

Z = (TDB × 256)/ TDD MVF = (Z × MV +128) >> 8W= Z – 256 MVB = (W× MV +128) >> 8

TDD

TDB

Time

Mode DecisionMode Decision

•• Block differenceBlock difference–– Diff(i,j) = Original(i,j) Diff(i,j) = Original(i,j) -- Prediction(i,j)Prediction(i,j)

•• SAD and SATDSAD and SATDSAD and SATDSAD and SATD–– DiffT means apply Hadamard transform to DiffT means apply Hadamard transform to

DiffDiff PredictionBlock_difference

Hadamard transformSA(T)D

SA(T)Dmin

Integer-pel search∑=

ji

jiDiffSAD,

),(

Loop for prediction mode decision

2/)),((,∑=

jijiDiffTSATD

17

Page 17


•• Given the last decoded frames, Lagrange Given the last decoded frames, Lagrange multipliersmultipliers

3/QP

and the and the MBMB quantization quantization parameter QP.parameter QP.(N t(N t LL f B SP f i 4 tif B SP f i 4 ti

,

,285.0 3/

MODEMOTION

QPMODE

LL

L

=

×=

(Note: (Note: LLMODEMODE for B or SP frame is 4 times as for B or SP frame is 4 times as much as that for I or P frame.) much as that for I or P frame.)


•• Choose intra prediction modes for the Intra Choose intra prediction modes for the Intra 4x4 macroblock mode by minimizing with4x4 macroblock mode by minimizing with

•• Determine the best Intra16x16 prediction Determine the best Intra16x16 prediction mode by choosing the mode that results in the mode by choosing the mode that results in the minimum SATDminimum SATD

{ }DHORUHORLVERTRVERTDRDIAGDLDIAGVERTHORDCIMODE _,_,_,_,_,_,,,∈

minimum SATD.minimum SATD.

18

Page 18

Mode DecisionMode Decision•• For each 8x8 subFor each 8x8 sub--partitionpartition

–– Perform motion estimation and reference frame selection by Perform motion estimation and reference frame selection by minimizingminimizing SSD + L x Rate(MV, REF)SSD + L x Rate(MV, REF)B frames: Choose prediction direction by minimizingB frames: Choose prediction direction by minimizing–– B frames: Choose prediction direction by minimizingB frames: Choose prediction direction by minimizingSSD + L x Rate(MV(PDIR), REF(PDIR))SSD + L x Rate(MV(PDIR), REF(PDIR))

–– Determine the coding mode of the 8x8 subDetermine the coding mode of the 8x8 sub--partition using the ratepartition using the rate--constrained mode decision, i.e. minimizeconstrained mode decision, i.e. minimizeSSD + L x Rate(MV, REF, LumaSSD + L x Rate(MV, REF, Luma--Coeff, block 8x8 mode)Coeff, block 8x8 mode)

•• Here the SSD calculation is based on the reconstructed Here the SSD calculation is based on the reconstructed signal after DCT, quantization, and IDCTsignal after DCT, quantization, and IDCT

[ ] [ ]( )

[ ] [ ]( )

[ ] [ ]( )

16,16 2

1, 1

8,8 2

1, 1

8,8 2

1, 1

( , , | ) , , , |

, , , |

, , , | ,

Y Yx y

U Ux y

V Vx y

SSD s c MODE QP s x y c x y MODE QP

s x y c x y MODE QP

s x y c x y MODE QP

= =

= =

= =

= −

+ −

+ −

∑

∑

∑


•• Perform motion estimation and reference frame Perform motion estimation and reference frame selection for 16x16, 16x8, and 8x16 modes by selection for 16x16, 16x8, and 8x16 modes by minimizingminimizingminimizingminimizing

•• B frames: Determine prediction direction by B frames: Determine prediction direction by minimizingminimizing

))())()((()))(,(,()()|)(,(

REFRREFREFRLREFREFcsDTSALREFREFJ

MOTION

MOTION

+−⋅+= pmmm

)))(())()((()))(,(,()|(

PDIRREFRPDIRPDIRRLPDIRPDIRcsSATDLPDIRJ

MOTION

MOTION

+−⋅+= pmm

19

Page 19

Mode DecisionMode Decision•• Choose the Choose the MBMB prediction mode by minimizingprediction mode by minimizing

I:I:)|,,()|,,(),|,,( QPMODEcsRLQPMODEcsSSDLQPMODEcsJ MODEMODE ⋅+=I:I:

P:P:

B:B:

{ }1616,44 ××∈ INTRAINTRAMODE

⎭⎬⎫

⎩⎨⎧

××××××

∈,88,168,816,1616

,,1616,44 SKIPINTRAINTRAMODE

⎭⎬⎫

⎩⎨⎧

××××××

∈88,168,816,1616

,,1616,44 DIRECTINTRAINTRAMODE

•• “skip mode” refers to the 16x16 mode where no motion “skip mode” refers to the 16x16 mode where no motion and residual information is encoded and residual information is encoded

MPEGMPEG--4 AVC/H.264: Intra Prediction4 AVC/H.264: Intra Prediction

ControlData

Q t

CoderControl

Transform/

InputVideoSignal

Directional spatial prediction (9 types for luma, 1 chroma)

Q A B C D E F G HI b d

EntropyCoding



Decoder

Scal./Quant.-Split into

Macroblocks16x16 pixels


De-blockingFilter

Output

I a b c dJ e f g hK i j k lL m n o pMNOP

18

6

Motion-Compensation

MotionData

Intra/Inter

MotionEstimation

OutputVideoSignal

• e.g., Mode 3: diagonal down/right predictiona, f, k, p are predicted by (A + 2Q + I + 2) >> 2

043

57

20

Page 20

Intra Prediction: 4x4 Luma BlocksIntra Prediction: 4x4 Luma Blocks•• Mode 0: vertical PredictionMode 0: vertical Prediction•• Mode 1: horizontal predictionMode 1: horizontal prediction•• Mode 2: DC predictionMode 2: DC predictionpp•• Mode 3: Diagonal down/left Mode 3: Diagonal down/left

predictionprediction•• Mode 4: Mode 4: Diagonal down/right Diagonal down/right

predictionprediction•• Mode 5: verticalMode 5: vertical--leftleft•• Mode 6: horizontalMode 6: horizontal--downdown

0

1

43

57

8

6

•• Mode 7: verticalMode 7: vertical--rightright•• Mode 8: horizontalMode 8: horizontal--upup

DC prediction:DC prediction:pred( x, y ) = Average of pixel A, B, C, D, E, pred( x, y ) = Average of pixel A, B, C, D, E,

F, G, and HF, G, and H

I A B C DE a b c dF e f g hG i j k lH m n o p

Mode 0I A B C DE a b c dF e f g hG i j k lH m n o p

Mode 1

Intra Prediction: 4x4 Luma PredictionIntra Prediction: 4x4 Luma Prediction

21

Page 21

Intra Prediction: 16x16 Luma BlocksIntra Prediction: 16x16 Luma Blocks

•• Mode 0: VerticalMode 0: Vertical•• Mode 1: HorizontalMode 1: Horizontal

P(15,-1)

•• Mode 2: DCMode 2: DC•• Mode 3: PlaneMode 3: Plane

–– Be used only if all neighboring Be used only if all neighboring samples are availablesamples are available

Pred(x,y) = Clip( (a + b·(x-7) + c·(y-7) +16) >> 5 ),where

P(-1,15)

(x,y)

wherea = 16·(P(-1,15) + P(15,-1))b = (5*H+32)>>6c = (5*V+32)>>6

8

1( (7 , 1) (7 , 1))

xH x P x P x

=

= ⋅ + − − − −∑8

1( ( 1,7 ) ( 1,7 ))

yV y P y P y

=

= ⋅ − + − − −∑

Intra Prediction: 16x16 Luma BlocksIntra Prediction: 16x16 Luma Blocks

.

…….. ……

.

H

Mean(H+V)V

22

Page 22

MPEGMPEG--4 AVC/H.264: Transform Coding4 AVC/H.264: Transform Coding

ControlData

CoderControl

InputVideoSignal

EntropyCoding



Decoder



Intra-frame P di ti

De-blockingFilter

4x4 Block Integer Transform

Main Profile: Adaptive Block Size T f (8 4 4 8 8 8)

1 1 1 12 1 1 21 1 1 11 2 2 1

⎡ ⎤⎢ ⎥− −⎢ ⎥=⎢ ⎥− −⎢ ⎥

− −⎢ ⎥⎣ ⎦

H

Motion-Compensation

MotionData

Intra/Inter

MotionEstimation

PredictionOutputVideoSignal

Transform (8x4,4x8,8x8)Repeated transform of DC coeffs for 8x8 chroma and 16x16 Intra luma blocks

Transform Coding: Luma DCTransform Coding: Luma DC

•• Luma DC in Intra_16x16 MBLuma DC in Intra_16x16 MB–– Using Hadamard transformationUsing Hadamard transformation

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 1 1 11 1 1 1 1 1 1 1

// 21 1 1 1 1 1 1 11 1 1 1 1 1 1 1

D D D D

D D D DD

D D D D

D D D D

x x x xx x x x

Yx x x xx x x x

⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥= ⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥⎜ ⎟− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦⎝ ⎠

Forward transform:

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1

QD QD QD QD

QD QD QD QDQD

QD QD QD QD

QD QD QD QD

y y y yy y y y

Xy y y yy y y y

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥

− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

Inverse transform:

23

Page 23

Transform Coding: Luma DCTransform Coding: Luma DC

0 1

2 3

CBPY 8*8 block order(raster scan order in MB)

10 4 5

2 3 6 7

8 9 12 13

2x2 DCCb Cr16 17

-1Y

...

Luma 4x4 DC for Intra 16x16macroblock type

18 19 22 23

Luma 4x4 block order for 4x4intra prediction and 4x4residual coding(raster scan order within 8x8region nested in raster scanorder of 8x8 regions)

Chroma 4x4 block order for4x4 residual coding, shownas 16-25, and intra 4x4prediction, shown as 18-21and 22-25 (raster scan orderin each 8x8 chroma region)

8 9 12 13

10 11 14 15AC

20 21 24 25

Transform Coding: Chroma DCTransform Coding: Chroma DC

•• Chroma DC in 8x8 blockChroma DC in 8x8 block–– Hadamard transformationHadamard transformation

00 01

10 11

1 1 1 11 1 1 1

D DD

D D

x xY

x x⎡ ⎤⎡ ⎤ ⎡ ⎤

= ⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎣ ⎦ ⎣ ⎦⎣ ⎦

Forward transform:

Inverse transform:Inverse transform:

⎥⎦

⎤⎢⎣

⎡−⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡−

=11

1111

11

1110

0100

QDQD

QDQDQD YY

YYX

24

Page 24

Transform: Luma and Chroma residualTransform: Luma and Chroma residual

•• Luminance and chrominance 4x4 residual blocksLuminance and chrominance 4x4 residual blocks•• Forward transformForward transform

•• Inverse TransformInverse Transform

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 12 1 1 2 1 1 1 21 1 1 1 1 1 1 21 2 2 1 1 2 1 1

x x x xx x x x

Yx x x xx x x x

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥

− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

12 00 01 02 03

1 112 210 11 12 132

120 21 22 232

1 11 30 31 32 33 2 22

1 1 1 1 1 1 11 11 1 11 1 1 11 1 1

1 11 1 1

y y y yy y y y

Xy y y yy y y y

⎡ ⎤ ⎡ ⎤⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ − −− −⎢ ⎥ ⎢ ⎥⎢ ⎥= ⎢ ⎥ ⎢ ⎥⎢ ⎥ − −− −⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ − −⎢ ⎥ ⎢ ⎥− − ⎣ ⎦ ⎣ ⎦⎣ ⎦

Quantization/Dequantization (1/6)Quantization/Dequantization (1/6)

•• Scan OrderScan Order–– 4x4 residual and 4x4 luma DC block4x4 residual and 4x4 luma DC block

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

–– 2x2 chroma DC block2x2 chroma DC block

•• Raster orderRaster order

25

Page 25


•• QP: 0 ~ 51QP: 0 ~ 51•• QPQPYY: QP for: QP for lumaluma coefficientscoefficientsQPQPYY: QP for : QP for lumaluma coefficientscoefficients•• QPQPCC: QP for : QP for chromachroma coefficientscoefficients

–– QPQPCC for for chromachroma is determined from the current value is determined from the current value of QPof QPYY

QPQPYY

26

Page 26


•• 4x4 luma DC block4x4 luma DC block•• QuantizationQuantization

( ) ( ) ( ) 18 / 6QP+⎡ ⎤( ) ( ) ( ) 18 / 6, , %6,0,0 2 / 2 , , = 0, ,3QPQD DY i j Y i j Q QP f i j+⎡ ⎤= ⋅ + ⋅⎣ ⎦ …

f = 217+QP/6/3 for intra framesf = 217+QP/6/6 for inter framesf have the same sign as the coefficient that is being quantized

•• DequantizationDequantization( ) ( ) ( ), , %6,0,0 // 4, , = 0, ,3D QDX i j X i j R QP i j⎡ ⎤= ⋅⎣ ⎦ …


•• 2x2 chroma DC2x2 chroma DC•• QuantizationQuantization

( ) ( ) ( ) 18 / 6QP+⎡ ⎤

f = 217+QP/6/3 for intra framesf = 217+QP/6/6 for inter framesf have the same sign as the coefficient that is being quantized

( ) ( ) ( ) 18 / 6, , %6,0,0 2 / 2 , , = 0,1QPQD DY i j Y i j Q QP f i j+⎡ ⎤= ⋅ + ⋅⎣ ⎦

•• DequantizationDequantization( ) ( ) ( ), , %6,0,0 // 2, , = 0,1D QDX i j X i j R QP i j⎡ ⎤= ⋅⎣ ⎦

27

Page 27

Quantization/Dequantization (6/6)Quantization/Dequantization (6/6)•• Q[QP%6][i][j] = quantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},Q[QP%6][i][j] = quantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},•• Q[QP%6][i][j] = quantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},Q[QP%6][i][j] = quantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},•• Q[QP%6][i][j] = quantMat[QP%6][2] otherwise.Q[QP%6][i][j] = quantMat[QP%6][2] otherwise.

•• R[QP%6][i][j] = dequantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},R[QP%6][i][j] = dequantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},•• R[QP%6][i][j] = dequantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},R[QP%6][i][j] = dequantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},•• R[QP%6][i][j] = dequantMat[QP%6][2] otherwise.R[QP%6][i][j] = dequantMat[QP%6][2] otherwise.•• quantMat[6][3] = {{13107, 5243, 8224},quantMat[6][3] = {{13107, 5243, 8224},

{11651, 4660, 7358},{11651, 4660, 7358},{10486, 4143, 6554},{10486, 4143, 6554},{ 9198, 3687, 5825},{ 9198, 3687, 5825},{ 8322 3290 5243}{ 8322 3290 5243}{ 8322, 3290, 5243},{ 8322, 3290, 5243},{ 7384, 2943, 4660}};{ 7384, 2943, 4660}};

•• dequantMat[6][3] = {{40, 64, 51},dequantMat[6][3] = {{40, 64, 51},{45, 72, 57},{45, 72, 57},{50, 81, 64},{50, 81, 64},{57, 91, 72},{57, 91, 72},{63, 102, 80},{63, 102, 80},{71, 114, 90}};{71, 114, 90}};

MPEGMPEG--4 AVC/H.264: Multiple Reference 4 AVC/H.264: Multiple Reference FramesFrames

ControlD t

CoderControl

EntropyCoding

Deq./Inv. Transform

Motion-Compensated

Data


0

Decoder

Transform/Quantizer-

MotionData

CompensatedPredictorIntra/Inter

MotionEstimator

Multiple Reference Frames for Motion Compensation

28

Page 28

MPEGMPEG--4 AVC/H.264: Residual Coding4 AVC/H.264: Residual Coding

Control

CoderControl

Residual coding is based on 4x4 blocks

EntropyCoding

Deq./Inv. Transform

Motion-

ControlData


0

Decoder

Transform/Quantizer-

Integer Transform

CompensatedPredictor

MotionData

Intra/Inter

MotionEstimator

Residual and Intra CodingResidual and Intra Coding

•• EXACT MATCHEXACT MATCH Simplified TransformSimplified Transform–– Based primarily on 4x4 transform (all prior standardsBased primarily on 4x4 transform (all prior standards:: 8x8)8x8)

–– Requires only Requires only 16 bit16 bit arithmetic (including intermediate values)arithmetic (including intermediate values)–– Expanded to 8x8 for chroma by 2x2 transform of the DC valuesExpanded to 8x8 for chroma by 2x2 transform of the DC values

Easily extensible to 10Easily extensible to 10 12 bits per component12 bits per component–– Easily extensible to 10Easily extensible to 10--12 bits per component12 bits per component

•• Adaptive block transform sizes for Main ProfileAdaptive block transform sizes for Main Profile•• Intra Coding StructureIntra Coding Structure

–– Directional spatial prediction (10 types luma, 1 chroma)Directional spatial prediction (10 types luma, 1 chroma)–– Expanded to 16x16 for luma intra by 4x4 transform of the DC valuesExpanded to 16x16 for luma intra by 4x4 transform of the DC values

29

Page 29

Quantization and DeblockingQuantization and Deblocking

•• Quantization of transform coefficientsQuantization of transform coefficientsLogarithmic step size controlLogarithmic step size control–– Logarithmic step size controlLogarithmic step size control

–– Extended range of step sizesExtended range of step sizes–– Smaller step size for chromaSmaller step size for chroma

(per H.263 Annex T)(per H.263 Annex T)–– TableTable--drivendriven

•• Reconstruction is 16Reconstruction is 16--bit multiply, add, shiftbit multiply, add, shifteco st uct o s 6eco st uct o s 6 b t u t p y, add, s tb t u t p y, add, s t•• Deblocking Filter (in the prediction loop)Deblocking Filter (in the prediction loop)

Deblocking FilterDeblocking Filter

16*16 Macroblock 16*16 Macroblock

Horizontal edges(luma)

Horizontal edges(chroma)

Boundaries in a macroblock to be filtered (luma boundaries shown with solid lines and chroma boundaries shown with dotted lines)

Vertical edges(chroma)

Vertical edges(luma)

30

Page 30


•• Content dependent boundary filtering Content dependent boundary filtering strengthstrengthstrengthstrength–– For each boundary between neighbouring 4x4 For each boundary between neighbouring 4x4

lumaluma blocks, a “Boundary Strength” blocks, a “Boundary Strength” BsBs is is assignedassigned

–– If If Bs Bs = 0= 0, filtering is skipped for that particular , filtering is skipped for that particular edgeedgeIn all other cases filtering is dependent on theIn all other cases filtering is dependent on the–– In all other cases, filtering is dependent on the In all other cases, filtering is dependent on the local sample properties and the value of local sample properties and the value of BsBs

Deblocking FilterDeblocking Filter•• Flowchart to determine the boundary strength Flowchart to determine the boundary strength BsBs

Block boundarybetween block p and qbetween block p and q

Block p or qintra coded or

slice type is SI or SP?

Bs=3

Block boundaryis also Macroblock

boundary?

Coefficientscoded in block

p or q?

Bs=2Bs=4YES NO

YES

YES

NO

NO

Block p and q havedifferent reference framesor a different number of

reference frames?

NO YES

|V1(p,x) - V1(q,x)| >= 1 or|V1(p,y) - V1(q,y)| >= 1 or

if bi-predictive|V2(p,x) - V2(q,x)| >= 1 or|V2(p,y) - V2(q,y)| >= 1

Bs=0(skip)Bs=1

reference frames?

YES NO

31

Page 31


•• Thresholds for each block boundaryThresholds for each block boundary–– Set of samples across this edge are only filtered if the Set of samples across this edge are only filtered if the

conditionconditionconditioncondition–– Bs ≠ 0Bs ≠ 0 && && |p|p00 –– qq00| < | < αα &&&& |p|p11 –– pp00| < | < ββ &&&& |q|q11 –– qq00| |

< < ββ–– αα andand ββ are determined by are determined by IndexA and IndexB IndexA and IndexB

respectivelyrespectively–– IndexA = Clip3(0, 51, QPav + Filter_Offset_A)IndexA = Clip3(0, 51, QPav + Filter_Offset_A)

I d B Cli 3(0 51 QP Filt Off t B)I d B Cli 3(0 51 QP Filt Off t B)–– IndexB = Clip3(0, 51, QPav + Filter_Offset_B)IndexB = Clip3(0, 51, QPav + Filter_Offset_B)–– Filter_Offset_A and Filter_Offset_B used to modify filter Filter_Offset_A and Filter_Offset_B used to modify filter

characteristicscharacteristics

Clip3( a, b, c) = ⎪⎩

⎪⎨

⎧><

otherwise;;;

cbcbaca

p3 p2 p1 p0 q0 q1 q2 q3

Deblocking Filter: Deblocking Filter: BsBs < 4< 4

•• ΔΔ = = Clip3(Clip3( --C, C, C, C, ((((((qq00 –– pp00)) 3) ) >> 3) )

C (C ( ))•• PP00 = Clip1(= Clip1( pp00++ΔΔ ) ) •• QQ00 = Clip1(= Clip1(qq00-- ΔΔ))

–– apap = = |p|p22 –– pp00||–– aqaq = = |q|q22 –– qq00||–– If If apap < < ββ,, PP11 = = pp11 + Clip3( + Clip3( --CC00, C, C00,, ((p2p2 + ( + ( pp00 + q+ q00 )>>1)>>1 ––

(( 1 1)1 1)) 1)) 1)((p1 1) –– If If aqaq < < ββ,, QQ11 = = qq11 + Clip3( + Clip3( --CC00, C, C00,, ((q2q2 + ( + ( pp00 + q+ q00 )>>1)>>1 ––

((qq11 1) 1)–– CC00 is determined by is determined by IndexAIndexA and and BsBs–– Clip1(x) = clip3(0, 255, x)Clip1(x) = clip3(0, 255, x)

32

Page 32

Deblocking Filter: Deblocking Filter: BsBs = 4= 4

•• Left/upper sideLeft/upper side•• If the following condition holds:If the following condition holds:

–– ap < ap < ββ &&&& |p|p00 –– qq00| | < ((< ((αα >> 2) + 2)>> 2) + 2) …………(8(8--71)71)–– PP00 = ( = ( pp22 + 2*+ 2*pp11 + 2*+ 2*pp00 + 2*+ 2*qq00 + + qq11 + 4) >> 3+ 4) >> 3–– PP11 = ( = ( pp22 + + pp11 + + pp00 + + qq00 + + 22) >> 2) >> 2–– In the case of luma filtering, In the case of luma filtering, –– PP22 = ( 2*= ( 2*p3p3 + + 3*3*pp22 + + pp11 + + pp00 + + qq00 + + 44) >> 3) >> 3

•• Otherwise, if the condition of (8Otherwise, if the condition of (8--71) does not 71) does not hold, hold, –– PP00 = ( 2*= ( 2*pp11 + + pp00 + + qq11 + 2) >> 2+ 2) >> 2

Deblocking Filter: Deblocking Filter: BsBs = 4= 4

•• Right/lower sideRight/lower side•• if the following condition holds:if the following condition holds:

–– aq < aq < ββ &&&& |p|p00 –– qq00| < | < ((((αα >>>> 2) +2)2) +2) (8(8--76)76)–– QQ00 = ( = ( pp11 + 2*+ 2*pp00 + 2*+ 2*qq00 + 2*+ 2*qq11 + + qq22 + 4) >> 3 + 4) >> 3 (8(8--77)77)–– QQ11 = ( = ( pp00 + + qq00 ++ qq11 + + qq22 + 2) >> 2+ 2) >> 2 (8(8--78)78)–– In the case of luma filtering, In the case of luma filtering, –– QQ22 = ( 2*= ( 2*qq33 + 3*+ 3*qq22 + + qq11 + + qq00 + + pp00 + 4) >> 3+ 4) >> 3 (8(8--79)79)

•• Otherwise, if the condition of (8Otherwise, if the condition of (8--76) does not hold,76) does not hold,–– QQ00 = ( 2*= ( 2*qq11 + + qq00 + + pp11 + 2) >> 2+ 2) >> 2

33

Page 33


Deblocking filter: Highly compressed decoded inter picture

1) Without Filter 2) with H264/AVC Deblocking

Entropy CodingEntropy Coding

ControlData

CoderControl

Transform/

InputVideoSignal

EntropyCoding

Inv. Scal. & Transform


Decoder




De-blockingFilter

Motion-Compensation

MotionData

Intra/Inter

MotionEstimation

OutputVideoSignal

34

Page 34

Variable Length CodingVariable Length Coding

Exp-Golomb code is used universally for all symbols except for transform coefficientsContext adaptive VLCs for coding of transform coefficients• No end-of-block, but number of coefficients

is decoded• Coefficients are scanned backwards• Coefficients are scanned backwards• Contexts are built dependent on transform

coefficients

ContentContent--based Adaptive Binary based Adaptive Binary Arithmetic Coding (CABAC)Arithmetic Coding (CABAC)

Usage of adaptive probability models for most symbolsmost symbolsExploiting symbol correlations by using contextsRestriction to binary arithmetic coding• Simple and fast adaptation mechanismp p• Fast binary arithmetic codec based on table

look-ups and shifts onlyAverage bit-rate saving over CAVLC 10-15%

35

Page 35

SP/SI FrameSP/SI Frame•• SP frame:SP frame:

–– motionmotion--compensated predictive codingcompensated predictive coding–– similar to Psimilar to P–– similar to P similar to P –– SP allows identical reconstruction even when different SP allows identical reconstruction even when different

reference pictures are being usedreference pictures are being used

•• SI frame:SI frame:–– spatial predictionspatial prediction–– similar to Isimilar to I–– SI allows identical reconstruction to a corresponding SI allows identical reconstruction to a corresponding

SP SP

•• provide functionalities for bitstream switching, provide functionalities for bitstream switching, splicing, random access, VCR functionalities such splicing, random access, VCR functionalities such as fastas fast--forward, and error resilience/recoveryforward, and error resilience/recovery

SP/SI Frame: Bitstream SwitchingSP/SI Frame: Bitstream Switching

Bitstream 2Bitstream 2 S2 PPP P

S12

Bitstream 1 S 1PP P P

36

Page 36

SP/SI Frame: Bitstream SplicingSP/SI Frame: Bitstream Splicing

Bitstream 2Bitstream 2 S2 PPP P

SI2

Bitstream 1 S 1PP P P

SP/SI Frame: Error Resiliency/RecoverySP/SI Frame: Error Resiliency/Recovery

S2S1 PPP P

S12

P

SI2

37

Page 37

Profiles and LevelsProfiles and Levels

ProfilesProfiles

•• Baseline profileBaseline profile•• Extended profileExtended profileExtended profile Extended profile •• Main profileMain profile

38

Page 38

Baseline ProfileBaseline Profile

•• I and P picture typeI and P picture type•• InIn--loop deblocking filterloop deblocking filter•• 1/41/4--sample motion compensationsample motion compensation•• VLCVLC--based entropy coding: CAVLCbased entropy coding: CAVLC•• 4:2:0 Chrominance format4:2:0 Chrominance format•• Field picturesField pictures (for Level 2.1 and above)(for Level 2.1 and above)•• use 15 or fewer Reference Framesuse 15 or fewer Reference Frames•• have a compression ratio per picture of 4:1 or have a compression ratio per picture of 4:1 or

greatergreater

Extended ProfileExtended Profile

•• BiBi--predictive slicespredictive slices•• SP and SI slicesSP and SI slices•• Weighted predictionWeighted prediction•• All features included in the Baseline ProfileAll features included in the Baseline Profile

39

Page 39

Main ProfileMain Profile

•• CABACCABAC•• Interlaced picturesInterlaced pictures•• All features included in the Baseline ProfileAll features included in the Baseline Profile

Level DefinitionsLevel DefinitionsLevel #Level # Max Max

Picture Picture Size (MBs)Size (MBs)

Max Max VideoVideoBitrate Bitrate (1000 (1000 bits/sec)bits/sec)

Horizontal MV Horizontal MV Range Range (full pels)(full pels)

Vertical MV Vertical MV Range Range (full pels)(full pels)

Minimum luma Minimum luma BiBi--predictive predictive block sizeblock size

))

11 9999 6464 [[--2048, 2047.75]2048, 2047.75] [[--64, 63.75]64, 63.75] 8x88x8

1.11.1 396396 128128 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8

1.21.2 396396 768768 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8

22 396396 20002000 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8

2.12.1 792792 40004000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8

2.22.2 16201620 40004000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8[[ , ], ] [[ , ], ]

33 16201620 80008000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8

3.13.1 36003600 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8

3.23.2 51205120 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8

44 81928192 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8

55 1920019200 TBDTBD [[--2048, 2047.75]2048, 2047.75] TBDTBD 8x88x8

40

Page 40

H.264 Codec Design SummaryH.264 Codec Design Summary

Video coding layer is based on hybrid video coding and similar in spirit to other standards but with important differencesNew key features are:• Enhanced motion compensation• Small blocks for transform coding• Improved de-blocking filterImproved de blocking filter• Enhanced entropy coding

Substantial bit-rate savings relative to other standards for the same quality

Complexity of H.264 Codec DesignComplexity of H.264 Codec Design

•• Codec design includes relaxation of traditional bounds Codec design includes relaxation of traditional bounds on complexity (memory & computation) on complexity (memory & computation) –– rough guess rough guess 22--3x decoding power increase relative to MPEG3x decoding power increase relative to MPEG--2 32 3--4x4x22 3x decoding power increase relative to MPEG3x decoding power increase relative to MPEG 2, 32, 3 4x 4x encodingencoding

•• Problem areas:Problem areas:–– Smaller block sizes for motion compensation (cache access Smaller block sizes for motion compensation (cache access

issues)issues)–– Longer filters for motion compensation (more memory access)Longer filters for motion compensation (more memory access)–– MultiMulti--frame motion compensation (more memory for reference frame motion compensation (more memory for reference p ( yp ( y

frame storage)frame storage)–– More segmentations of macroblock to choose from (more More segmentations of macroblock to choose from (more

searching in the encoder)searching in the encoder)–– More methods of predicting intra data (more searching)More methods of predicting intra data (more searching)–– Arithmetic coding (adaptivity, computation on output bits)Arithmetic coding (adaptivity, computation on output bits)

41

Page 41

Performance ComparisonPerformance Comparison

•• Test of different standardsTest of different standards•• Using same rateUsing same rate--distortion optimization techniques for distortion optimization techniques for

all codecsall codecs•• Streaming test: HighStreaming test: High--latency (included B frames)latency (included B frames)•• RealReal--time conversation test: No B framestime conversation test: No B frames•• Several video sequences for each testSeveral video sequences for each test•• Compare four codecs:Compare four codecs:

–– MPEGMPEG--2 (in high2 (in high--latency/streaming test only)latency/streaming test only)–– H.263 (highH.263 (high--latency profile, conversational highlatency profile, conversational high--compression compression

profile, baseline profile)profile, baseline profile)–– MPEGMPEG--4 (simple profile and advanced simple profile with & 4 (simple profile and advanced simple profile with &

without B pictures)without B pictures)–– JVT/H.26L/AVC (with & without B pictures)JVT/H.26L/AVC (with & without B pictures)

Coding Efficiency Comparison (1/4)Coding Efficiency Comparison (1/4)

Half-pelmotion

compensation

Framedifference

coding

PSNR[dB]

TMN-10Variable

block size

32

34

36

38 Foreman10 Hz, QCIF

100 frames encoded

compensation(MPEG-1 1993)

g(H.120 1988)

IntraframeDCT coding

? 67 %

block sizemotion

compensation(H.263 1998)

0 100 200 300 400 50026

28

30

32

Integer-pelmotion

compensation(H.261 1991)

DCT coding(DCT 1974, JPEG 1992)

Bit-Rate [kbps]

42

Page 42


3839

Foreman QCIF 10Hz

3031323334353637

QualityY-PSNR [dB]

MPEG-2H.263

MPEG-4JVT/H.264/AVC

27282930

0 50 100 150 200 250Bit-rate [kbit/s]


Alias 24 fps SDTV

50

35

40

45

Y PS

NR

MPEG-2(QP 2-7)AVC (QP 10,18,26)

25

30

0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00

Mbit/sec

43

Page 43


5

61st

MPEG-2 Encoder

2nd GenerationEncoder

2

3

4

Mbi

t/s

MPEG-2MPEG-4H.26LH.263

3rd GenerationEncoder

4th GenerationEncoder

5th GenerationEncoder

0

1

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

H.264 /MPEG-4 part 10

Source: Modulus Video

Test Set Results for Perceptual QualityTest Set Results for Perceptual Quality

•• Informal perceptual testsInformal perceptual tests•• At the same PSNR, people generally prefer JVTAt the same PSNR, people generally prefer JVTp p g y pp p g y p•• Why?Why?

–– Small motion compensation block sizeSmall motion compensation block size(breaks up block structure)(breaks up block structure)

–– Small transform block sizeSmall transform block size(breaks up block structure, reduces ringing)(breaks up block structure, reduces ringing)

–– InIn--loop deblocking filterloop deblocking filter

•• By how much?By how much?–– Needs further studyNeeds further study–– No rigorous testing reportedNo rigorous testing reported–– 1010--15% might be a good guess15% might be a good guess

44

Page 44

How were the Improvements ObtainedHow were the Improvements Obtained

• It mainly comes from incremental improvements:

-- Better predictionBetter prediction-- More computationMore computation-- More memoryMore memory

• No fundamental changes in the basic algorithm(DCT + MCPC)(DCT + MCPC)

ConclusionsConclusions

Video coding layer is based on hybrid video coding and similar in spirit to other standards but with important differencesNew key features are:New key features are:• Enhanced motion compensation• Small blocks for transform coding• Improved deblocking filter• Enhanced entropy coding

Bit-rate savings generally 50% or better against any other standard for the same perceptual quality (especially for higher-l t li ti ll i B i t )latency applications allowing B pictures)Increased complexity relative to prior standardsStandard of both ITU-T VCEG and ISO/IEC MPEGStandardization completing around end of this year to Spring of next year

15. 15. h.264/avch.264/avccwlin/courses/... · 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8) – integer,...

Documents