15. 15. h.264/avch.264/avccwlin/courses/... · 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8) – integer,...
TRANSCRIPT
-
1
Page 1
15. 15. H.264/AVCH.264/AVC
Prof. ChiaProf. Chia--Wen Lin (Wen Lin (林嘉文林嘉文))o C ao C a e (e (林嘉文林嘉文))Department of Department of Electrical Engineering Electrical Engineering
National National TsingTsing HuaHua UniversityUniversity0303--57311525731152
[email protected]@ee.nthu.edu.tw
MPEGMPEG--4 Parts4 Parts
Part I: Part I: SystemsSystemsPart II: Part II: VisualVisualPart III: Part III: AudioAudioPart IV: Part IV: ConformanceConformancePart V: Part V: Reference softwareReference softwarePart VI: Part VI: DMIF (Delivery Multimedia Integration Framework)DMIF (Delivery Multimedia Integration Framework)Part VII: Part VII: Optimized software for MPEGOptimized software for MPEG--4 tools4 toolsPart VIII: Part VIII: MPEGMPEG--4 on IP framework4 on IP frameworkPart IX: Part IX: Reference hardware descriptionReference hardware descriptionPart X: Part X: Advanced Video Coding (AVC)Advanced Video Coding (AVC)
-
2
Page 2
MPEGMPEG--4 Parts4 Parts
Visual Visual –– Part 2 (ISO/IEC 14496Part 2 (ISO/IEC 14496--2)2)–– VideoVideo
C di f t l idC di f t l id•• Coding of natural videoCoding of natural video–– SNHC (SyntheticSNHC (Synthetic--Natural Hybrid Coding)Natural Hybrid Coding)
•• Facial & Body animationFacial & Body animation•• Graphic codingGraphic coding
–– Texture codingTexture coding–– Sprite codingSprite coding
Vi lVi l P t 10 (ISO/IEC 14496P t 10 (ISO/IEC 14496 10)10)Visual Visual –– Part 10 (ISO/IEC 14496Part 10 (ISO/IEC 14496--10)10)–– AVC (Advanced Video Coding)AVC (Advanced Video Coding)–– JVT (Joint Video Team), ISO+ITUJVT (Joint Video Team), ISO+ITU--TT–– Focused solely on coding of natural videoFocused solely on coding of natural video–– Very high coding efficiencyVery high coding efficiency
MPEGMPEG--4 AVC4 AVC
Working Draft 2 Working Draft 2 -- January 2002January 2002Committee Draft (CD) Committee Draft (CD) –– May 2002May 2002Final CD Final CD –– July 2002July 2002FDIS (Final Draft International Standard) FDIS (Final Draft International Standard) ––December 2002December 2002
-
3
Page 3
Video Coding StandardsVideo Coding Standards
MPEGMPEG--22–– State of the art 1994State of the art 1994–– State of the art, 1994State of the art, 1994
MPEGMPEG--4 Video, Part 24 Video, Part 2–– ASP (Advanced Simple Profile)ASP (Advanced Simple Profile)–– State of the art, 1999State of the art, 1999–– ~ 1.5 coding gain over MPEG~ 1.5 coding gain over MPEG--2 (on average)2 (on average)
MPEGMPEG 4 AVC P t 104 AVC P t 10MPEGMPEG--4 AVC, Part 104 AVC, Part 10–– State of the art, 2002State of the art, 2002–– ~ 2x coding gain over MPEG~ 2x coding gain over MPEG--2 (on average)2 (on average)–– Final Draft Standard in Dec 2002Final Draft Standard in Dec 2002
The Design Goals of MPEGThe Design Goals of MPEG--4 AVC4 AVC
•• High compression efficiencyHigh compression efficiency•• Flexible application to delay constraintsFlexible application to delay constraintsFlexible application to delay constraints Flexible application to delay constraints
appropriate to a variety of servicesappropriate to a variety of services•• Error resilience capabilityError resilience capability•• Complexity scalabilityComplexity scalability•• Full specification of decoding (no mismatch)Full specification of decoding (no mismatch)•• High quality applicationHigh quality application•• Network friendlinessNetwork friendliness
-
4
Page 4
ApplicationsApplications
•• Conversational services for video telephony and Conversational services for video telephony and video conferencingvideo conferencing
•• Live or preLive or pre--coded video streaming servicescoded video streaming services•• Video in multimedia messaging services (MMS) Video in multimedia messaging services (MMS)
Video Coding HierarchyVideo Coding Hierarchy
•• Sequence, Sequence, consisting ofconsisting ofPi tPi t NAL•• Pictures, Pictures, consisting ofconsisting of
•• Slices, Slices, consisting ofconsisting of•• Macroblocks, Macroblocks, consisting ofconsisting of•• Blocks, Blocks, consisting ofconsisting of
Pixels / PelsPixels / Pels
NAL
VCL
•• Pixels / PelsPixels / Pels
Note: for interlaced video, a picture consists of either one frame or two fields
-
5
Page 5
VCL and NALVCL and NAL
•• H.264 consists of H.264 consists of –– Video Coding Layer (VCL) Video Coding Layer (VCL) ––
•• Perform the tasks associated with video codingPerform the tasks associated with video coding–– Network Abstraction Layer (NAL) Network Abstraction Layer (NAL) ––
•• Implement videoImplement video--specific support features for a specific support features for a variety of networksvariety of networks
•• Seamless and easy integration into all current Seamless and easy integration into all current transmission protocoltransmission protocoltransmission protocoltransmission protocol
•• Easier packetization and better information priority Easier packetization and better information priority controlcontrol
The Features of VCL (1/3)The Features of VCL (1/3)
•• TransformationTransformation–– Integer 4x4 block transform for residual codingInteger 4x4 block transform for residual coding–– HardamardHardamard
•• A 4x4 transform on the DC coefficients of the 4x4 A 4x4 transform on the DC coefficients of the 4x4 blocks in a 16x16 macroblockblocks in a 16x16 macroblock
•• A 2x2 transform for the DC coefficients of the 4x4 A 2x2 transform for the DC coefficients of the 4x4 chroma blocks in a 8x8 macroblockchroma blocks in a 8x8 macroblock
-
6
Page 6
The Features of VCL (2/3)The Features of VCL (2/3)
•• QuantizationQuantization•• Motion EstimationMotion EstimationMotion EstimationMotion Estimation
–– Variable blockVariable block--size motion prediction (7 block sizes: size motion prediction (7 block sizes: 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8)16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8)
–– Integer, 1/2Integer, 1/2--, and 1/4, and 1/4--pixel motion vector accuracypixel motion vector accuracy–– Multiple reference frames (max. 15) may be used for Multiple reference frames (max. 15) may be used for
predictionprediction
The Features of VCL (3/3)The Features of VCL (3/3)
•• Entropy coding:Entropy coding:–– ContextContext--based Adaptive Variable Length Coding based Adaptive Variable Length Coding
(CAVLC)(CAVLC)(CAVLC)(CAVLC)–– ContextContext--based Adaptive Binary Arithmetic Coding based Adaptive Binary Arithmetic Coding
(CABAC)(CABAC)
•• Others:Others:–– SpaceSpace--domain Intra prediction (10 prediction modes)domain Intra prediction (10 prediction modes)–– DeDe--blocking loop filterblocking loop filterDeDe blocking loop filterblocking loop filter–– Motion vector predictionMotion vector prediction–– Slice structureSlice structure–– Interlace coding toolsInterlace coding tools
-
7
Page 7
Frame TypesFrame Types
•• II--frameframe•• PP--frameframePP frameframe•• BB--frameframe•• SPSP-- and SIand SI--frameframe
–– SP and SI frames provide functionalities for bitSP and SI frames provide functionalities for bit--stream switching, splicing, random access, VCR stream switching, splicing, random access, VCR functionalities, and error resilience/recoveryfunctionalities, and error resilience/recoveryfunctionalities, and error resilience/recoveryfunctionalities, and error resilience/recovery
Picture FormatsPicture Formats
•• Color sequences using 4:2:0 chroma subColor sequences using 4:2:0 chroma sub--samplingsampling
...TopField
BottomField
TopField
interlaced framesprogressive frames
...
= Location of luminance sample= Location of chrominance sample
Guide:Time
= Luminance Sample
= Chrominance Sample
-
8
Page 8
Macroblock SubdivisionMacroblock Subdivision
•• Each Picture is divided into 16x16 macroblocks.Each Picture is divided into 16x16 macroblocks.•• The order of the macroblocks in the bitstream depends The order of the macroblocks in the bitstream depends
on the Macroblock Allocation Map and is noton the Macroblock Allocation Map and is noton the Macroblock Allocation Map and is not on the Macroblock Allocation Map and is not necessarily raster scan ordernecessarily raster scan order
0 1 2 3 4 5 6 0 1 2 3 40 1 2 3 4 5 6
7 8 9
0 1 2 3 4
5 6 7 8 9
MPEGMPEG--4 AVC/H.264: Encoder 4 AVC/H.264: Encoder ArchitectureArchitecture
ControlData
CoderControl
T f /
InputVideoSignal
EntropyCoding
Scaling & Inv. Transform
Quant.Transf. coeffs
Decoder
Transform/Scal./Quant.-
Split intoMacroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
Motion-Compensation
MotionData
Intra/Inter
MotionEstimation
PredictionOutputVideoSignal
-
9
Page 9
MPEGMPEG--4 AVC/H.264: Motion 4 AVC/H.264: Motion CompensationCompensation
ControlData
Q t
CoderControl
Transform/
InputVideoSignal
EntropyCoding
Scaling & Inv. Transform
Quant.Transf. coeffs
Decoder
Scal./Quant.-Split into
Macroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
Output0
16x16
0 1
8x16MB
Types
8x80 12 3
16x8
1
0
Motion-Compensation
MotionData
Intra/Inter
MotionEstimation
OutputVideoSignal
Motion vector accuracy 1/4 (6-tap filter)
8x8
0
4x8
0 10 12 3
4x48x4
108x8
Types
2 31
Variable Variable BlockBlock--Size CodingSize Coding
-
10
Page 10
Motion CompensationMotion Compensation
•• Various block sizes and shapes for motion Various block sizes and shapes for motion compensation compensation
•• 1/4 sample accuracy (sort of per MPEG1/4 sample accuracy (sort of per MPEG--4, Pt. 2 V.2)4, Pt. 2 V.2)–– 6 tap filtering to 1/2 sample accuracy6 tap filtering to 1/2 sample accuracy–– simplified filtering to 1/4 sample accuracysimplified filtering to 1/4 sample accuracy–– special position with heavier filteringspecial position with heavier filtering
•• Multiple reference pictures (per H.263++ Annex U)Multiple reference pictures (per H.263++ Annex U)•• TemporallyTemporally--reversed motion and generalized Breversed motion and generalized B--
framesframes•• BB--frame prediction weightingframe prediction weighting
Block Modes of P PicturesBlock Modes of P Pictures
•• MacroblockMacroblock: 16x16: 16x16•• 7 motion prediction modes7 motion prediction modes
–– 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x416x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4–– Motion vectors accuracy: integer, ½Motion vectors accuracy: integer, ½--, and ¼, and ¼--pixelpixel
0 0 1
1
0 0 1
2 3
Mode 1 Mode 2 Mode 3 Mode 4
0 1 2 3
4 5 6 7
0 12 34 56 7
0 1 2 34 5 6 78 9 10 11
12 13 14 15
Mode 5 Mode 6 Mode 7
-
11
Page 11
Motion Vector SearchMotion Vector Search
•• Motion EstimationMotion Estimation–– Integer pixel searchInteger pixel search
F ti l i l h (1/2F ti l i l h (1/2 d 1/4d 1/4 i l)i l)–– Fractional pixel search (1/2Fractional pixel search (1/2-- and 1/4and 1/4--pixel)pixel)–– Reference frames selection from multiple reference Reference frames selection from multiple reference
frames (max. 15 frames)frames (max. 15 frames)–– Search range: Search range:
•• horizontal [horizontal [--2048, 2047.75] (max) 2048, 2047.75] (max) •• vertical [vertical [--512, 511.75] (max) 512, 511.75] (max) e t ca [e t ca [ 5 , 5 5] ( a )5 , 5 5] ( a )
Motion EstimationMotion Estimation
•• Motion vector predictionMotion vector prediction–– In same sliceIn same slice–– Median prediction (except 16x8 and 8x16 blocks)Median prediction (except 16x8 and 8x16 blocks)p ( p )p ( p )
A, B, C, D and E may come from different reference pictures
V1 = median{VA,VB,VC,VD }
1. C is not available, VC = VD2. B,C, and D are not available, VB = VD = VD = VA3. Any predictor is not either of above two rules, its MV is 0
-
12
Page 12
Motion EstimationMotion Estimation
•• Motion vector prediction for 16x8 and 8x16 blocksMotion vector prediction for 16x8 and 8x16 blocks–– Directional segmentation predictionDirectional segmentation prediction
8x16 16x8
Motion EstimationMotion Estimation
•• Integer pixel searchInteger pixel search–– search positions are organised in a “search positions are organised in a “spiralspiral”” structure structure
around the predicted vectoraround the predicted vectoraround the predicted vectoraround the predicted vector
. . . . . .. 15 9 11 13 16. 17 3 1 4 18. 19 5 0 6 20. 21 7 2 8 228. 23 10 12 14 24
-
13
Page 13
Motion EstimationMotion Estimation
•• full fractionalfull fractional--ppixixel searchel search
V1D1 D2
ppixixel searchel search((½½-- and ¼and ¼--pixel)pixel)
a b c
d e
f g h
I II III
IV V
VI VII VIII
CH1 H2
V2D3 D4
Capital letters (C,H1,H2…) : integer pixel positionsRoma numbers (I,II,III...): 1/2-pel positionsLower case letters(a,b,c...):1/4-pel positions
Motion EstimationMotion Estimation
•• Fractional pixel searchFractional pixel search–– Check the eight 1/2Check the eight 1/2--pel candidates, I ~ VIII around pel candidates, I ~ VIII around
the best integerthe best integer pelpel C;C; decide the best 1/2decide the best 1/2 pelpel VVthe best integerthe best integer--pel pel C;C; decide the best 1/2decide the best 1/2--pel pel VVsubject to the minimal cost among the 1/2 subject to the minimal cost among the 1/2 --pel pel candidatescandidates
–– Check the eight 1/4Check the eight 1/4--pel candidates, a ~ h around pel candidates, a ~ h around the best 1/2the best 1/2--pel pel V, V, decide the best 1/4decide the best 1/4--pel pel hh subject subject to the minimal cost among the 1/4to the minimal cost among the 1/4--pel candidatespel candidates
–– Select the motion vector and blockSelect the motion vector and block--size pattern,size pattern,Select the motion vector and blockSelect the motion vector and block size pattern, size pattern, which produces the lowest costwhich produces the lowest cost
-
14
Page 14
Fractional Pel Value Interpolation: Fractional Pel Value Interpolation: LumaLuma
•• Calculate HalfCalculate Half--PelPel valuesvalues–– use 6use 6--tap filter {1, tap filter {1, --5, 20, 20, 5, 20, 20, --
5, 1} to get b5, 1} to get b5, 1} to get b5, 1} to get b–– bbhh= clip(((b+16)>>5))= clip(((b+16)>>5))–– c from b values using the 6 tap c from b values using the 6 tap
filterfilter–– ccmm= clip(((c+512)>>10))= clip(((c+512)>>10))
•• Average of integer and halfAverage of integer and half--pelpell t fi dl t fi d d fd fvalues to find values to find d,e,f,gd,e,f,g
–– e.g. d = (e.g. d = (A+bA+bhh)>>1)>>1•• h = h = ((bbhh++bbvv)>>)>>1 (diagonal direction 1 (diagonal direction
averaging)averaging)•• ii = (A1+A2+A3+A4+2)>>2= (A1+A2+A3+A4+2)>>2
Fractional Pel Value Interpolation: Fractional Pel Value Interpolation: ChromaChroma
•• dydy are the fractional position in are the fractional position in units of one eighth samplesunits of one eighth samples
•• A, B, C, and D are integer pixelsA, B, C, and D are integer pixels A B
dxdy
8-dx
8-dy
C D
22 /)2/)(8)())((( 88DddCddBd8dAd8d8v yxyxyxyx ++−+−+−−=
-
15
Page 15
BB--PicturesPictures
•• Advantages:Advantages:–– Improve coding efficiencyImprove coding efficiency–– Provide temporal scalabilityProvide temporal scalability
•• 5 modes:5 modes:–– Direct Mode: derived forward and backward MVs, none transmittedDirect Mode: derived forward and backward MVs, none transmitted–– Forward Mode: prediction from a previous reference frameForward Mode: prediction from a previous reference frame–– Backward Mode: prediction from a subsequent reference frameBackward Mode: prediction from a subsequent reference frame–– BiBi--directional Mode: separate forward and backward MVsdirectional Mode: separate forward and backward MVs–– Intra Prediction ModeIntra Prediction Mode
•• MVs in Direct Mode:MVs in Direct Mode: P/I PBMVs in Direct Mode:MVs in Direct Mode:–– MVF = (TRB * MV)/TRDMVF = (TRB * MV)/TRD–– MVB = (TRB MVB = (TRB -- TRD) * MV/TRDTRD) * MV/TRD MV
MVF
MVB
TimeTRD
TRB
BB--PicturesPictures•• Direct ModeDirect Mode
–– No MV data is transmittedNo MV data is transmittedSame block structure as coSame block structure as co--located MB in temporallylocated MB in temporally–– Same block structure as coSame block structure as co--located MB in temporally located MB in temporally subsequent picturesubsequent picture
–– MVs are computed as scaled version of corresponding MVs are computed as scaled version of corresponding MV of the coMV of the co--located MBlocated MB
I0 B 1 B 2 B 3 P 4 B 5 B 6 B 7 P 8
-
16
Page 16
BB--PicturesPictures
f0 f1f1f0f1f0
List 1 ReferenceList 0 Reference Current B
MVMVF
MVB
............
current block co-located block
Z = (TDB × 256)/ TDD MVF = (Z × MV +128) >> 8W= Z – 256 MVB = (W× MV +128) >> 8
TDD
TDB
Time
Mode DecisionMode Decision
•• Block differenceBlock difference–– Diff(i,j) = Original(i,j) Diff(i,j) = Original(i,j) -- Prediction(i,j)Prediction(i,j)
•• SAD and SATDSAD and SATDSAD and SATDSAD and SATD–– DiffT means apply Hadamard transform to DiffT means apply Hadamard transform to
DiffDiff PredictionBlock_difference
Hadamard transformSA(T)D
SA(T)Dmin
Integer-pel search∑=
ji
jiDiffSAD,
),(
Loop for prediction mode decision
2/)),((,∑=
jijiDiffTSATD
-
17
Page 17
Mode DecisionMode Decision
•• Given the last decoded frames, Lagrange Given the last decoded frames, Lagrange multipliersmultipliers
3/QP
and the and the MBMB quantization quantization parameter QP.parameter QP.(N t(N t LL f B SP f i 4 tif B SP f i 4 ti
,
,285.0 3/
MODEMOTION
QPMODE
LL
L
=
×=
(Note: (Note: LLMODEMODE for B or SP frame is 4 times as for B or SP frame is 4 times as much as that for I or P frame.) much as that for I or P frame.)
Mode DecisionMode Decision
•• Choose intra prediction modes for the Intra Choose intra prediction modes for the Intra 4x4 macroblock mode by minimizing with4x4 macroblock mode by minimizing with
•• Determine the best Intra16x16 prediction Determine the best Intra16x16 prediction mode by choosing the mode that results in the mode by choosing the mode that results in the minimum SATDminimum SATD
{ }DHORUHORLVERTRVERTDRDIAGDLDIAGVERTHORDCIMODE _,_,_,_,_,_,,,∈
minimum SATD.minimum SATD.
-
18
Page 18
Mode DecisionMode Decision•• For each 8x8 subFor each 8x8 sub--partitionpartition
–– Perform motion estimation and reference frame selection by Perform motion estimation and reference frame selection by minimizingminimizing SSD + L x Rate(MV, REF)SSD + L x Rate(MV, REF)B frames: Choose prediction direction by minimizingB frames: Choose prediction direction by minimizing–– B frames: Choose prediction direction by minimizingB frames: Choose prediction direction by minimizingSSD + L x Rate(MV(PDIR), REF(PDIR))SSD + L x Rate(MV(PDIR), REF(PDIR))
–– Determine the coding mode of the 8x8 subDetermine the coding mode of the 8x8 sub--partition using the ratepartition using the rate--constrained mode decision, i.e. minimizeconstrained mode decision, i.e. minimizeSSD + L x Rate(MV, REF, LumaSSD + L x Rate(MV, REF, Luma--Coeff, block 8x8 mode)Coeff, block 8x8 mode)
•• Here the SSD calculation is based on the reconstructed Here the SSD calculation is based on the reconstructed signal after DCT, quantization, and IDCTsignal after DCT, quantization, and IDCT
[ ] [ ]( )
[ ] [ ]( )
[ ] [ ]( )
16,16 2
1, 1
8,8 2
1, 1
8,8 2
1, 1
( , , | ) , , , |
, , , |
, , , | ,
Y Yx y
U Ux y
V Vx y
SSD s c MODE QP s x y c x y MODE QP
s x y c x y MODE QP
s x y c x y MODE QP
= =
= =
= =
= −
+ −
+ −
∑
∑
∑
Mode DecisionMode Decision
•• Perform motion estimation and reference frame Perform motion estimation and reference frame selection for 16x16, 16x8, and 8x16 modes by selection for 16x16, 16x8, and 8x16 modes by minimizingminimizingminimizingminimizing
•• B frames: Determine prediction direction by B frames: Determine prediction direction by minimizingminimizing
))())()((()))(,(,()()|)(,(
REFRREFREFRLREFREFcsDTSALREFREFJ
MOTION
MOTION
+−⋅+= pmmm
)))(())()((()))(,(,()|(
PDIRREFRPDIRPDIRRLPDIRPDIRcsSATDLPDIRJ
MOTION
MOTION
+−⋅+= pmm
-
19
Page 19
Mode DecisionMode Decision•• Choose the Choose the MBMB prediction mode by minimizingprediction mode by minimizing
I:I:)|,,()|,,(),|,,( QPMODEcsRLQPMODEcsSSDLQPMODEcsJ MODEMODE ⋅+=I:I:
P:P:
B:B:
{ }1616,44 ××∈ INTRAINTRAMODE
⎭⎬⎫
⎩⎨⎧
××××××
∈,88,168,816,1616
,,1616,44 SKIPINTRAINTRAMODE
⎭⎬⎫
⎩⎨⎧
××××××
∈88,168,816,1616
,,1616,44 DIRECTINTRAINTRAMODE
•• “skip mode” refers to the 16x16 mode where no motion “skip mode” refers to the 16x16 mode where no motion and residual information is encoded and residual information is encoded
MPEGMPEG--4 AVC/H.264: Intra Prediction4 AVC/H.264: Intra Prediction
ControlData
Q t
CoderControl
Transform/
InputVideoSignal
Directional spatial prediction (9 types for luma, 1 chroma)
Q A B C D E F G HI b d
EntropyCoding
Scaling & Inv. Transform
Quant.Transf. coeffs
Decoder
Scal./Quant.-Split into
Macroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
Output
I a b c dJ e f g hK i j k lL m n o pMNOP
18
6
Motion-Compensation
MotionData
Intra/Inter
MotionEstimation
OutputVideoSignal
• e.g., Mode 3: diagonal down/right predictiona, f, k, p are predicted by (A + 2Q + I + 2) >> 2
043
57
-
20
Page 20
Intra Prediction: 4x4 Luma BlocksIntra Prediction: 4x4 Luma Blocks•• Mode 0: vertical PredictionMode 0: vertical Prediction•• Mode 1: horizontal predictionMode 1: horizontal prediction•• Mode 2: DC predictionMode 2: DC predictionpp•• Mode 3: Diagonal down/left Mode 3: Diagonal down/left
predictionprediction•• Mode 4: Mode 4: Diagonal down/right Diagonal down/right
predictionprediction•• Mode 5: verticalMode 5: vertical--leftleft•• Mode 6: horizontalMode 6: horizontal--downdown
0
1
43
57
8
6
•• Mode 7: verticalMode 7: vertical--rightright•• Mode 8: horizontalMode 8: horizontal--upup
DC prediction:DC prediction:pred( x, y ) = Average of pixel A, B, C, D, E, pred( x, y ) = Average of pixel A, B, C, D, E,
F, G, and HF, G, and H
I A B C DE a b c dF e f g hG i j k lH m n o p
Mode 0I A B C DE a b c dF e f g hG i j k lH m n o p
Mode 1
Intra Prediction: 4x4 Luma PredictionIntra Prediction: 4x4 Luma Prediction
-
21
Page 21
Intra Prediction: 16x16 Luma BlocksIntra Prediction: 16x16 Luma Blocks
•• Mode 0: VerticalMode 0: Vertical•• Mode 1: HorizontalMode 1: Horizontal
P(15,-1)
•• Mode 2: DCMode 2: DC•• Mode 3: PlaneMode 3: Plane
–– Be used only if all neighboring Be used only if all neighboring samples are availablesamples are available
Pred(x,y) = Clip( (a + b·(x-7) + c·(y-7) +16) >> 5 ),where
P(-1,15)
(x,y)
wherea = 16·(P(-1,15) + P(15,-1))b = (5*H+32)>>6c = (5*V+32)>>6
8
1( (7 , 1) (7 , 1))
xH x P x P x
=
= ⋅ + − − − −∑8
1( ( 1,7 ) ( 1,7 ))
yV y P y P y
=
= ⋅ − + − − −∑
Intra Prediction: 16x16 Luma BlocksIntra Prediction: 16x16 Luma Blocks
.
…….. ……
.
H
Mean(H+V)V
-
22
Page 22
MPEGMPEG--4 AVC/H.264: Transform Coding4 AVC/H.264: Transform Coding
ControlData
CoderControl
InputVideoSignal
EntropyCoding
Scaling & Inv. Transform
Quant.Transf. coeffs
Decoder
Transform/Scal./Quant.-
Split intoMacroblocks16x16 pixels
Intra-frame P di ti
De-blockingFilter
4x4 Block Integer Transform
Main Profile: Adaptive Block Size T f (8 4 4 8 8 8)
1 1 1 12 1 1 21 1 1 11 2 2 1
⎡ ⎤⎢ ⎥− −⎢ ⎥=⎢ ⎥− −⎢ ⎥
− −⎢ ⎥⎣ ⎦
H
Motion-Compensation
MotionData
Intra/Inter
MotionEstimation
PredictionOutputVideoSignal
Transform (8x4,4x8,8x8)Repeated transform of DC coeffs for 8x8 chroma and 16x16 Intra luma blocks
Transform Coding: Luma DCTransform Coding: Luma DC
•• Luma DC in Intra_16x16 MBLuma DC in Intra_16x16 MB–– Using Hadamard transformationUsing Hadamard transformation
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
1 1 1 1 1 1 1 11 1 1 1 1 1 1 1
// 21 1 1 1 1 1 1 11 1 1 1 1 1 1 1
D D D D
D D D DD
D D D D
D D D D
x x x xx x x x
Yx x x xx x x x
⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥= ⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥⎜ ⎟− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦⎝ ⎠
Forward transform:
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1
QD QD QD QD
QD QD QD QDQD
QD QD QD QD
QD QD QD QD
y y y yy y y y
Xy y y yy y y y
⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥
− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦
Inverse transform:
-
23
Page 23
Transform Coding: Luma DCTransform Coding: Luma DC
0 1
2 3
CBPY 8*8 block order(raster scan order in MB)
10 4 5
2 3 6 7
8 9 12 13
2x2 DCCb Cr16 17
-1Y
...
Luma 4x4 DC for Intra 16x16macroblock type
18 19 22 23
Luma 4x4 block order for 4x4intra prediction and 4x4residual coding(raster scan order within 8x8region nested in raster scanorder of 8x8 regions)
Chroma 4x4 block order for4x4 residual coding, shownas 16-25, and intra 4x4prediction, shown as 18-21and 22-25 (raster scan orderin each 8x8 chroma region)
8 9 12 13
10 11 14 15AC
20 21 24 25
Transform Coding: Chroma DCTransform Coding: Chroma DC
•• Chroma DC in 8x8 blockChroma DC in 8x8 block–– Hadamard transformationHadamard transformation
00 01
10 11
1 1 1 11 1 1 1
D DD
D D
x xY
x x⎡ ⎤⎡ ⎤ ⎡ ⎤
= ⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎣ ⎦ ⎣ ⎦⎣ ⎦
Forward transform:
Inverse transform:Inverse transform:
⎥⎦
⎤⎢⎣
⎡−⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡−
=11
1111
11
1110
0100
QDQD
QDQDQD YY
YYX
-
24
Page 24
Transform: Luma and Chroma residualTransform: Luma and Chroma residual
•• Luminance and chrominance 4x4 residual blocksLuminance and chrominance 4x4 residual blocks•• Forward transformForward transform
•• Inverse TransformInverse Transform
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
1 1 1 1 1 2 1 12 1 1 2 1 1 1 21 1 1 1 1 1 1 21 2 2 1 1 2 1 1
x x x xx x x x
Yx x x xx x x x
⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥
− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦
12 00 01 02 03
1 112 210 11 12 132
120 21 22 232
1 11 30 31 32 33 2 22
1 1 1 1 1 1 11 11 1 11 1 1 11 1 1
1 11 1 1
y y y yy y y y
Xy y y yy y y y
⎡ ⎤ ⎡ ⎤⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ − −− −⎢ ⎥ ⎢ ⎥⎢ ⎥= ⎢ ⎥ ⎢ ⎥⎢ ⎥ − −− −⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ − −⎢ ⎥ ⎢ ⎥− − ⎣ ⎦ ⎣ ⎦⎣ ⎦
Quantization/Dequantization (1/6)Quantization/Dequantization (1/6)
•• Scan OrderScan Order–– 4x4 residual and 4x4 luma DC block4x4 residual and 4x4 luma DC block
0 1 5 6
2 4 7 12
3 8 11 13
9 10 14 15
–– 2x2 chroma DC block2x2 chroma DC block
•• Raster orderRaster order
-
25
Page 25
Quantization/Dequantization (2/6)Quantization/Dequantization (2/6)
•• QP: 0 ~ 51QP: 0 ~ 51•• QPQPYY: QP for: QP for lumaluma coefficientscoefficientsQPQPYY: QP for : QP for lumaluma coefficientscoefficients•• QPQPCC: QP for : QP for chromachroma coefficientscoefficients
–– QPQPCC for for chromachroma is determined from the current value is determined from the current value of QPof QPYY
QPQPYY
-
26
Page 26
Quantization/Dequantization (4/6)Quantization/Dequantization (4/6)
•• 4x4 luma DC block4x4 luma DC block•• QuantizationQuantization
( ) ( ) ( ) 18 / 6QP+⎡ ⎤( ) ( ) ( ) 18 / 6, , %6,0,0 2 / 2 , , = 0, ,3QPQD DY i j Y i j Q QP f i j+⎡ ⎤= ⋅ + ⋅⎣ ⎦ …
f = 217+QP/6/3 for intra framesf = 217+QP/6/6 for inter framesf have the same sign as the coefficient that is being quantized
•• DequantizationDequantization( ) ( ) ( ), , %6,0,0 // 4, , = 0, ,3D QDX i j X i j R QP i j⎡ ⎤= ⋅⎣ ⎦ …
Quantization/Dequantization (5/6)Quantization/Dequantization (5/6)
•• 2x2 chroma DC2x2 chroma DC•• QuantizationQuantization
( ) ( ) ( ) 18 / 6QP+⎡ ⎤
f = 217+QP/6/3 for intra framesf = 217+QP/6/6 for inter framesf have the same sign as the coefficient that is being quantized
( ) ( ) ( ) 18 / 6, , %6,0,0 2 / 2 , , = 0,1QPQD DY i j Y i j Q QP f i j+⎡ ⎤= ⋅ + ⋅⎣ ⎦
•• DequantizationDequantization( ) ( ) ( ), , %6,0,0 // 2, , = 0,1D QDX i j X i j R QP i j⎡ ⎤= ⋅⎣ ⎦
-
27
Page 27
Quantization/Dequantization (6/6)Quantization/Dequantization (6/6)•• Q[QP%6][i][j] = quantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},Q[QP%6][i][j] = quantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},•• Q[QP%6][i][j] = quantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},Q[QP%6][i][j] = quantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},•• Q[QP%6][i][j] = quantMat[QP%6][2] otherwise.Q[QP%6][i][j] = quantMat[QP%6][2] otherwise.
•• R[QP%6][i][j] = dequantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},R[QP%6][i][j] = dequantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},•• R[QP%6][i][j] = dequantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},R[QP%6][i][j] = dequantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},•• R[QP%6][i][j] = dequantMat[QP%6][2] otherwise.R[QP%6][i][j] = dequantMat[QP%6][2] otherwise.•• quantMat[6][3] = {{13107, 5243, 8224},quantMat[6][3] = {{13107, 5243, 8224},
{11651, 4660, 7358},{11651, 4660, 7358},{10486, 4143, 6554},{10486, 4143, 6554},{ 9198, 3687, 5825},{ 9198, 3687, 5825},{ 8322 3290 5243}{ 8322 3290 5243}{ 8322, 3290, 5243},{ 8322, 3290, 5243},{ 7384, 2943, 4660}};{ 7384, 2943, 4660}};
•• dequantMat[6][3] = {{40, 64, 51},dequantMat[6][3] = {{40, 64, 51},{45, 72, 57},{45, 72, 57},{50, 81, 64},{50, 81, 64},{57, 91, 72},{57, 91, 72},{63, 102, 80},{63, 102, 80},{71, 114, 90}};{71, 114, 90}};
MPEGMPEG--4 AVC/H.264: Multiple Reference 4 AVC/H.264: Multiple Reference FramesFrames
ControlD t
CoderControl
EntropyCoding
Deq./Inv. Transform
Motion-Compensated
Data
Quant.Transf. coeffs
0
Decoder
Transform/Quantizer-
MotionData
CompensatedPredictorIntra/Inter
MotionEstimator
Multiple Reference Frames for Motion Compensation
-
28
Page 28
MPEGMPEG--4 AVC/H.264: Residual Coding4 AVC/H.264: Residual Coding
Control
CoderControl
Residual coding is based on 4x4 blocks
EntropyCoding
Deq./Inv. Transform
Motion-
ControlData
Quant.Transf. coeffs
0
Decoder
Transform/Quantizer-
Integer Transform
CompensatedPredictor
MotionData
Intra/Inter
MotionEstimator
Residual and Intra CodingResidual and Intra Coding
•• EXACT MATCHEXACT MATCH Simplified TransformSimplified Transform–– Based primarily on 4x4 transform (all prior standardsBased primarily on 4x4 transform (all prior standards:: 8x8)8x8)
–– Requires only Requires only 16 bit16 bit arithmetic (including intermediate values)arithmetic (including intermediate values)–– Expanded to 8x8 for chroma by 2x2 transform of the DC valuesExpanded to 8x8 for chroma by 2x2 transform of the DC values
Easily extensible to 10Easily extensible to 10 12 bits per component12 bits per component–– Easily extensible to 10Easily extensible to 10--12 bits per component12 bits per component
•• Adaptive block transform sizes for Main ProfileAdaptive block transform sizes for Main Profile•• Intra Coding StructureIntra Coding Structure
–– Directional spatial prediction (10 types luma, 1 chroma)Directional spatial prediction (10 types luma, 1 chroma)–– Expanded to 16x16 for luma intra by 4x4 transform of the DC valuesExpanded to 16x16 for luma intra by 4x4 transform of the DC values
-
29
Page 29
Quantization and DeblockingQuantization and Deblocking
•• Quantization of transform coefficientsQuantization of transform coefficientsLogarithmic step size controlLogarithmic step size control–– Logarithmic step size controlLogarithmic step size control
–– Extended range of step sizesExtended range of step sizes–– Smaller step size for chromaSmaller step size for chroma
(per H.263 Annex T)(per H.263 Annex T)–– TableTable--drivendriven
•• Reconstruction is 16Reconstruction is 16--bit multiply, add, shiftbit multiply, add, shifteco st uct o s 6eco st uct o s 6 b t u t p y, add, s tb t u t p y, add, s t•• Deblocking Filter (in the prediction loop)Deblocking Filter (in the prediction loop)
Deblocking FilterDeblocking Filter
16*16 Macroblock 16*16 Macroblock
Horizontal edges(luma)
Horizontal edges(chroma)
Boundaries in a macroblock to be filtered (luma boundaries shown with solid lines and chroma boundaries shown with dotted lines)
Vertical edges(chroma)
Vertical edges(luma)
-
30
Page 30
Deblocking FilterDeblocking Filter
•• Content dependent boundary filtering Content dependent boundary filtering strengthstrengthstrengthstrength–– For each boundary between neighbouring 4x4 For each boundary between neighbouring 4x4
lumaluma blocks, a “Boundary Strength” blocks, a “Boundary Strength” BsBs is is assignedassigned
–– If If Bs Bs = 0= 0, filtering is skipped for that particular , filtering is skipped for that particular edgeedgeIn all other cases filtering is dependent on theIn all other cases filtering is dependent on the–– In all other cases, filtering is dependent on the In all other cases, filtering is dependent on the local sample properties and the value of local sample properties and the value of BsBs
Deblocking FilterDeblocking Filter•• Flowchart to determine the boundary strength Flowchart to determine the boundary strength BsBs
Block boundarybetween block p and qbetween block p and q
Block p or qintra coded or
slice type is SI or SP?
Bs=3
Block boundaryis also Macroblock
boundary?
Coefficientscoded in block
p or q?
Bs=2Bs=4YES NO
YES
YES
NO
NO
Block p and q havedifferent reference framesor a different number of
reference frames?
NO YES
|V1(p,x) - V1(q,x)| >= 1 or|V1(p,y) - V1(q,y)| >= 1 or
if bi-predictive|V2(p,x) - V2(q,x)| >= 1 or|V2(p,y) - V2(q,y)| >= 1
Bs=0(skip)Bs=1
reference frames?
YES NO
-
31
Page 31
Deblocking FilterDeblocking Filter
•• Thresholds for each block boundaryThresholds for each block boundary–– Set of samples across this edge are only filtered if the Set of samples across this edge are only filtered if the
conditionconditionconditioncondition–– Bs ≠ 0Bs ≠ 0 && && |p|p00 –– qq00| < | < αα &&&& |p|p11 –– pp00| < | < ββ &&&& |q|q11 –– qq00| |
< < ββ–– αα andand ββ are determined by are determined by IndexA and IndexB IndexA and IndexB
respectivelyrespectively–– IndexA = Clip3(0, 51, QPav + Filter_Offset_A)IndexA = Clip3(0, 51, QPav + Filter_Offset_A)
I d B Cli 3(0 51 QP Filt Off t B)I d B Cli 3(0 51 QP Filt Off t B)–– IndexB = Clip3(0, 51, QPav + Filter_Offset_B)IndexB = Clip3(0, 51, QPav + Filter_Offset_B)–– Filter_Offset_A and Filter_Offset_B used to modify filter Filter_Offset_A and Filter_Offset_B used to modify filter
characteristicscharacteristics
Clip3( a, b, c) = ⎪⎩
⎪⎨
⎧><
otherwise;;;
cbcbaca
p3 p2 p1 p0 q0 q1 q2 q3
Deblocking Filter: Deblocking Filter: BsBs < 4< 4
•• ΔΔ = = Clip3(Clip3( --C, C, C, C, ((((((qq00 –– pp00)) 3) ) >> 3) )
C (C ( ))•• PP00 = Clip1(= Clip1( pp00++ΔΔ ) ) •• QQ00 = Clip1(= Clip1(qq00-- ΔΔ))
–– apap = = |p|p22 –– pp00||–– aqaq = = |q|q22 –– qq00||–– If If apap < < ββ,, PP11 = = pp11 + Clip3( + Clip3( --CC00, C, C00,, ((p2p2 + ( + ( pp00 + q+ q00 )>>1)>>1 ––
(( 1 1)1 1)) 1)) 1)((p1 1) –– If If aqaq < < ββ,, QQ11 = = qq11 + Clip3( + Clip3( --CC00, C, C00,, ((q2q2 + ( + ( pp00 + q+ q00 )>>1)>>1 ––
((qq11 1) 1)–– CC00 is determined by is determined by IndexAIndexA and and BsBs–– Clip1(x) = clip3(0, 255, x)Clip1(x) = clip3(0, 255, x)
-
32
Page 32
Deblocking Filter: Deblocking Filter: BsBs = 4= 4
•• Left/upper sideLeft/upper side•• If the following condition holds:If the following condition holds:
–– ap < ap < ββ &&&& |p|p00 –– qq00| | < ((< ((αα >> 2) + 2)>> 2) + 2) …………(8(8--71)71)–– PP00 = ( = ( pp22 + 2*+ 2*pp11 + 2*+ 2*pp00 + 2*+ 2*qq00 + + qq11 + 4) >> 3+ 4) >> 3–– PP11 = ( = ( pp22 + + pp11 + + pp00 + + qq00 + + 22) >> 2) >> 2–– In the case of luma filtering, In the case of luma filtering, –– PP22 = ( 2*= ( 2*p3p3 + + 3*3*pp22 + + pp11 + + pp00 + + qq00 + + 44) >> 3) >> 3
•• Otherwise, if the condition of (8Otherwise, if the condition of (8--71) does not 71) does not hold, hold, –– PP00 = ( 2*= ( 2*pp11 + + pp00 + + qq11 + 2) >> 2+ 2) >> 2
Deblocking Filter: Deblocking Filter: BsBs = 4= 4
•• Right/lower sideRight/lower side•• if the following condition holds:if the following condition holds:
–– aq < aq < ββ &&&& |p|p00 –– qq00| < | < ((((αα >>>> 2) +2)2) +2) (8(8--76)76)–– QQ00 = ( = ( pp11 + 2*+ 2*pp00 + 2*+ 2*qq00 + 2*+ 2*qq11 + + qq22 + 4) >> 3 + 4) >> 3 (8(8--77)77)–– QQ11 = ( = ( pp00 + + qq00 ++ qq11 + + qq22 + 2) >> 2+ 2) >> 2 (8(8--78)78)–– In the case of luma filtering, In the case of luma filtering, –– QQ22 = ( 2*= ( 2*qq33 + 3*+ 3*qq22 + + qq11 + + qq00 + + pp00 + 4) >> 3+ 4) >> 3 (8(8--79)79)
•• Otherwise, if the condition of (8Otherwise, if the condition of (8--76) does not hold,76) does not hold,–– QQ00 = ( 2*= ( 2*qq11 + + qq00 + + pp11 + 2) >> 2+ 2) >> 2
-
33
Page 33
Deblocking FilterDeblocking Filter
Deblocking filter: Highly compressed decoded inter picture
1) Without Filter 2) with H264/AVC Deblocking
Entropy CodingEntropy Coding
ControlData
CoderControl
Transform/
InputVideoSignal
EntropyCoding
Inv. Scal. & Transform
Quant.Transf. coeffs
Decoder
Transform/Scal./Quant.-
Split intoMacroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
Motion-Compensation
MotionData
Intra/Inter
MotionEstimation
OutputVideoSignal
-
34
Page 34
Variable Length CodingVariable Length Coding
Exp-Golomb code is used universally for all symbols except for transform coefficientsContext adaptive VLCs for coding of transform coefficients• No end-of-block, but number of coefficients
is decoded• Coefficients are scanned backwards• Coefficients are scanned backwards• Contexts are built dependent on transform
coefficients
ContentContent--based Adaptive Binary based Adaptive Binary Arithmetic Coding (CABAC)Arithmetic Coding (CABAC)
Usage of adaptive probability models for most symbolsmost symbolsExploiting symbol correlations by using contextsRestriction to binary arithmetic coding• Simple and fast adaptation mechanismp p• Fast binary arithmetic codec based on table
look-ups and shifts onlyAverage bit-rate saving over CAVLC 10-15%
-
35
Page 35
SP/SI FrameSP/SI Frame•• SP frame:SP frame:
–– motionmotion--compensated predictive codingcompensated predictive coding–– similar to Psimilar to P–– similar to P similar to P –– SP allows identical reconstruction even when different SP allows identical reconstruction even when different
reference pictures are being usedreference pictures are being used
•• SI frame:SI frame:–– spatial predictionspatial prediction–– similar to Isimilar to I–– SI allows identical reconstruction to a corresponding SI allows identical reconstruction to a corresponding
SP SP
•• provide functionalities for bitstream switching, provide functionalities for bitstream switching, splicing, random access, VCR functionalities such splicing, random access, VCR functionalities such as fastas fast--forward, and error resilience/recoveryforward, and error resilience/recovery
SP/SI Frame: Bitstream SwitchingSP/SI Frame: Bitstream Switching
Bitstream 2Bitstream 2 S2 PPP P
S12
Bitstream 1 S 1PP P P
-
36
Page 36
SP/SI Frame: Bitstream SplicingSP/SI Frame: Bitstream Splicing
Bitstream 2Bitstream 2 S2 PPP P
SI2
Bitstream 1 S 1PP P P
SP/SI Frame: Error Resiliency/RecoverySP/SI Frame: Error Resiliency/Recovery
S2S1 PPP P
S12
P
SI2
-
37
Page 37
Profiles and LevelsProfiles and Levels
ProfilesProfiles
•• Baseline profileBaseline profile•• Extended profileExtended profileExtended profile Extended profile •• Main profileMain profile
-
38
Page 38
Baseline ProfileBaseline Profile
•• I and P picture typeI and P picture type•• InIn--loop deblocking filterloop deblocking filter•• 1/41/4--sample motion compensationsample motion compensation•• VLCVLC--based entropy coding: CAVLCbased entropy coding: CAVLC•• 4:2:0 Chrominance format4:2:0 Chrominance format•• Field picturesField pictures (for Level 2.1 and above)(for Level 2.1 and above)•• use 15 or fewer Reference Framesuse 15 or fewer Reference Frames•• have a compression ratio per picture of 4:1 or have a compression ratio per picture of 4:1 or
greatergreater
Extended ProfileExtended Profile
•• BiBi--predictive slicespredictive slices•• SP and SI slicesSP and SI slices•• Weighted predictionWeighted prediction•• All features included in the Baseline ProfileAll features included in the Baseline Profile
-
39
Page 39
Main ProfileMain Profile
•• CABACCABAC•• Interlaced picturesInterlaced pictures•• All features included in the Baseline ProfileAll features included in the Baseline Profile
Level DefinitionsLevel DefinitionsLevel #Level # Max Max
Picture Picture Size (MBs)Size (MBs)
Max Max VideoVideoBitrate Bitrate (1000 (1000 bits/sec)bits/sec)
Horizontal MV Horizontal MV Range Range (full pels)(full pels)
Vertical MV Vertical MV Range Range (full pels)(full pels)
Minimum luma Minimum luma BiBi--predictive predictive block sizeblock size
))
11 9999 6464 [[--2048, 2047.75]2048, 2047.75] [[--64, 63.75]64, 63.75] 8x88x8
1.11.1 396396 128128 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8
1.21.2 396396 768768 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8
22 396396 20002000 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8
2.12.1 792792 40004000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8
2.22.2 16201620 40004000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8[[ , ], ] [[ , ], ]
33 16201620 80008000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8
3.13.1 36003600 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8
3.23.2 51205120 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8
44 81928192 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8
55 1920019200 TBDTBD [[--2048, 2047.75]2048, 2047.75] TBDTBD 8x88x8
-
40
Page 40
H.264 Codec Design SummaryH.264 Codec Design Summary
Video coding layer is based on hybrid video coding and similar in spirit to other standards but with important differencesNew key features are:• Enhanced motion compensation• Small blocks for transform coding• Improved de-blocking filterImproved de blocking filter• Enhanced entropy coding
Substantial bit-rate savings relative to other standards for the same quality
Complexity of H.264 Codec DesignComplexity of H.264 Codec Design
•• Codec design includes relaxation of traditional bounds Codec design includes relaxation of traditional bounds on complexity (memory & computation) on complexity (memory & computation) –– rough guess rough guess 22--3x decoding power increase relative to MPEG3x decoding power increase relative to MPEG--2 32 3--4x4x22 3x decoding power increase relative to MPEG3x decoding power increase relative to MPEG 2, 32, 3 4x 4x encodingencoding
•• Problem areas:Problem areas:–– Smaller block sizes for motion compensation (cache access Smaller block sizes for motion compensation (cache access
issues)issues)–– Longer filters for motion compensation (more memory access)Longer filters for motion compensation (more memory access)–– MultiMulti--frame motion compensation (more memory for reference frame motion compensation (more memory for reference p ( yp ( y
frame storage)frame storage)–– More segmentations of macroblock to choose from (more More segmentations of macroblock to choose from (more
searching in the encoder)searching in the encoder)–– More methods of predicting intra data (more searching)More methods of predicting intra data (more searching)–– Arithmetic coding (adaptivity, computation on output bits)Arithmetic coding (adaptivity, computation on output bits)
-
41
Page 41
Performance ComparisonPerformance Comparison
•• Test of different standardsTest of different standards•• Using same rateUsing same rate--distortion optimization techniques for distortion optimization techniques for
all codecsall codecs•• Streaming test: HighStreaming test: High--latency (included B frames)latency (included B frames)•• RealReal--time conversation test: No B framestime conversation test: No B frames•• Several video sequences for each testSeveral video sequences for each test•• Compare four codecs:Compare four codecs:
–– MPEGMPEG--2 (in high2 (in high--latency/streaming test only)latency/streaming test only)–– H.263 (highH.263 (high--latency profile, conversational highlatency profile, conversational high--compression compression
profile, baseline profile)profile, baseline profile)–– MPEGMPEG--4 (simple profile and advanced simple profile with & 4 (simple profile and advanced simple profile with &
without B pictures)without B pictures)–– JVT/H.26L/AVC (with & without B pictures)JVT/H.26L/AVC (with & without B pictures)
Coding Efficiency Comparison (1/4)Coding Efficiency Comparison (1/4)
Half-pelmotion
compensation
Framedifference
coding
PSNR[dB]
TMN-10Variable
block size
32
34
36
38 Foreman10 Hz, QCIF
100 frames encoded
compensation(MPEG-1 1993)
g(H.120 1988)
IntraframeDCT coding
? 67 %
block sizemotion
compensation(H.263 1998)
0 100 200 300 400 50026
28
30
32
Integer-pelmotion
compensation(H.261 1991)
DCT coding(DCT 1974, JPEG 1992)
Bit-Rate [kbps]
-
42
Page 42
Coding Efficiency Comparison (2/4)Coding Efficiency Comparison (2/4)
3839
Foreman QCIF 10Hz
3031323334353637
QualityY-PSNR [dB]
MPEG-2H.263
MPEG-4JVT/H.264/AVC
27282930
0 50 100 150 200 250Bit-rate [kbit/s]
Coding Efficiency Comparison (3/4)Coding Efficiency Comparison (3/4)
Alias 24 fps SDTV
50
35
40
45
Y PS
NR
MPEG-2(QP 2-7)AVC (QP 10,18,26)
25
30
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00
Mbit/sec
-
43
Page 43
Coding Efficiency Comparison (4/4)Coding Efficiency Comparison (4/4)
5
61st
MPEG-2 Encoder
2nd GenerationEncoder
2
3
4
Mbi
t/s
MPEG-2MPEG-4H.26LH.263
3rd GenerationEncoder
4th GenerationEncoder
5th GenerationEncoder
0
1
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
H.264 /MPEG-4 part 10
Source: Modulus Video
Test Set Results for Perceptual QualityTest Set Results for Perceptual Quality
•• Informal perceptual testsInformal perceptual tests•• At the same PSNR, people generally prefer JVTAt the same PSNR, people generally prefer JVTp p g y pp p g y p•• Why?Why?
–– Small motion compensation block sizeSmall motion compensation block size(breaks up block structure)(breaks up block structure)
–– Small transform block sizeSmall transform block size(breaks up block structure, reduces ringing)(breaks up block structure, reduces ringing)
–– InIn--loop deblocking filterloop deblocking filter
•• By how much?By how much?–– Needs further studyNeeds further study–– No rigorous testing reportedNo rigorous testing reported–– 1010--15% might be a good guess15% might be a good guess
-
44
Page 44
How were the Improvements ObtainedHow were the Improvements Obtained
• It mainly comes from incremental improvements:
-- Better predictionBetter prediction-- More computationMore computation-- More memoryMore memory
• No fundamental changes in the basic algorithm(DCT + MCPC)(DCT + MCPC)
ConclusionsConclusions
Video coding layer is based on hybrid video coding and similar in spirit to other standards but with important differencesNew key features are:New key features are:• Enhanced motion compensation• Small blocks for transform coding• Improved deblocking filter• Enhanced entropy coding
Bit-rate savings generally 50% or better against any other standard for the same perceptual quality (especially for higher-l t li ti ll i B i t )latency applications allowing B pictures)Increased complexity relative to prior standardsStandard of both ITU-T VCEG and ISO/IEC MPEGStandardization completing around end of this year to Spring of next year