outline introduction on multimedia coding motion estimation discrete cosine transform video coding...
TRANSCRIPT
Outline
Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards
Multimedia Concepts
What is multimedia? Combination of audio, video, image,
graphic, and text. Coverage of all human I/O’s.
Why does multimedia need to be coded?
Multimedia Coding for Different Applications
Mobile devices Low data-rate, error resilience, scalability
Streaming service Scalability, low to medium data-range,
interactivity On-disk distribution (DVD)
Interactivity Broadcast
On-demand services
System Architecture
Compression LayerStreams from as low as bps to Mbps
System LayerManages Elementary Streams, their
synchronization and hierarchical relations
Delivery LayerProvides transparent access and delivery of content irrespective of
delivery technologies
Media awareDelivery unaware
Media awareDelivery aware
Media unawareDelivery aware
Coding of Audiovisual Objects
Audiovisual scene is with “objects” Mixed different objects on the screen Visual
Video Animated face & body; 2D and 3D animated meshes Text and Graphics
Audio General audio – mono, stereo, and multichannel Speech Synthetic sounds (“Structured audio”) Environmental spatialization
Example of MPEG-4 Video Objects
Arbitrary shapevideo objectRectangular shape
video object
Animated Face
From Olivier Avaro
The Scene Tree
1. Composition2. Description &
Synchronization3. Delivery of streaming data4. Interaction with media
objects5. Management and
identification of intellectual property
Major Components
CompositionRendering
Media Objects
Scene Graph
Adding or Removing Objects (1)
–
+
=
Adding or Removing Objects (2)
From Igor S. Pandž�ić�
Adding or Removing Objects (3)
Applications Video conferencing
Real-time, automatic Separate foreground (communication partner)
from background Object tracking in video
May allow off-line and semi-automatic Separate moving object from others
Coding Techniques
Video objects Shape Motion vectors texture
Audio objects MPEG AAC (Advanced Audio Coder) TTS (Text-To-Speech)
Face and Body Animation parameters
2D Mesh Triangular patches Motion vector
Encoding of Visual Objects
Binary alpha block Motion vector Context-based
arithmetic encoding
Texture Motion vector DCT
Natural Audio Coder
Quality
Cellular
Telephone
AM
FM
CD
2 4 8 16 32 64 kbit/s
Parametric speech(HVXC)
High quality speech(CELP)
General audio(AAC, TwinVQ)
Parametric audio(HILN)
From Olivier Dećhažal
Facial Animation
From Eine Übersićht
Object Mesh
Useful for animation, content manipulation, content overlay, merging natural and synthetic video...
Tessellate with triangular paths
Sprite Coding
Represent background image with a larger size than that of image.
Useful for camera motion
Multiview Video
Outline
Introduction on Multimedia Coding Discrete Cosine Transform Motion Estimation Video Coding Standards
Outline
What Is DCT And Why Use DCT How to Compute DCT Program The DCT Conclusion
An Image-Transform Coding System
Forward transform
Quantizer
Binary encoder
Inverse transform
Inverse quantizer
Binary decoder
Network
Input samples
Output samples
575 205 215 140
355 155 105 20
150 200 65 25
100 70 30 10
57 20 21 14
35 15 10 2
15 20 6 2
10 7 3 1
÷10
57 20 21 14
35 15 10 2
15 20 6 2
10 7 3 1
570 205 210 140
350 150 100 20
150 200 60 20
100 70 30 10
×10
e.g. zip, RARHuffman coding
Introduction(1/5) – Representation of An Image
How to code an image ?1. Spatial domain (pixel-based)
2. Transform domain Transformation methods
KLT , DFT , DWT , DCT...
Use cosine function as its basis function
Performance approaches KLT Fast algorithm exists Most popular in image compression
application Adopted in JPEG, M-JPEG, MPEG,
H.26x
Introduction(2/5) – Why Use DCT? Properties of DCT
Introduction (3/5) - Does Transform Really Make Sense ?
Energy compaction De-correlation: dependency elimination
Introduction (4/5) - Examples
8
8
139 148 150 149 155 164 165 168
98 115 130 135 143 146 142 147
89 110 125 128 129 121 104 106
96 116 128 132 134 132 113 109
111 125 127 131 137 137 120 110
122 126 126 131 133 131 126 112
133 134 136 138 140 144 141 139
138 139 139 139 140 146 148 147
DCT
IDCT
Pixel values in spatial domain
DCT coefficients in transform domain
A pixel expressed by it’s value The coefficient of the basis vector (0,0)
Introduction (5/5) - Examples
Definition of Basis Function
Basis function of the 1-D N-point DCT
For N = 8
1,...,1,0,2
)12(cos)(;
Nn
N
knku nk
1,...,2,12
01
)(
NkN
kNk
7,...,1,0,16
)12(cos)(;
n
knku nk
Basic diagram of DCT
N
knsksut n
N
nnnk
N
nk 2
)12(cos)(
1
0
*;
1
0
1,...,1,0,2
)12(cos)(
1
0;
1
0
Nn
N
kntktus k
N
nknk
N
nn
Discrete cosine transform and Inverse DCT
(1)
(2)
The basis of 2D-DCT with 8x8 block
Again – Do You Know What DCT Mean?
DCT
IDCT
Pixel values in spatial domain
DCT coefficients in transform domain
A pixel expressed by it’s value The coefficient of the basis vector (0,0)
How to Compute: 1-D VS. 2-D [1-D] For a M × N 2D-block, we can
use 1D N-point DCT in the row direction, then the 1-D M-point DCT in the column direction to get the 2D-DCT
[2-D] If 8 × 8 blocks are applied, the 2D-DCT will be
DCT matrix is orthonormal
)]16/)()12cos(()16/)()12[cos(()2/1(
)16/)12cos(()16/)12cos((
70
70
vuivui
viui
i
i
The above equation is zero if u≠vorthorgonal
The basis vector of DCT has unit norm According the above two , we know DCT matrix is orthonormal The same is applied to 2D-DCT
7,...,1,0,16
)12(cos)(;
i
uiku ui
Properties of Orthonormal
Energy can be conservation
Transform matrix can be refractorseparable
Energy conservation of orthonormal transform
Separable Transform (1/2)
Separable Transform (2/2)
Fast DCT algorithm (1/2)
Fast DCT algorithm (2/2)
How to program (1/3) - Normal form
/***************************************************************************//*2D N*N DCT *//*Input *//*int argSourće[N][N] : One bloćk in the original image *//*Output *//*float argDCT[N][N] : The bloćk in frequenćy domain ćorresponding to argSourće[M][N] *//***************************************************************************/void DCT(int argDCT[8][8] , int argSourće[8][8]){float C[8],Cos[8][8];float temp;int i,j,u,v;
for(i=0;i<8;i++) for(j=0;j<8;j++) Cos[i][j]=ćos((2*i+1)*j*PI/16);
C[0]=0.35355339; for(i=1;i<8;i++) C[i]=0.5;
for(u=0;u<8;u++) for(v=0;v<8;v++) { temp=0.0; for(i=0;i<8;i++) for(j=0;j<8;j++) temp+=Cos[i][u]*Cos[j][v]*(argSourće[i][j]-128); temp*=C[u]*C[v]; argDCT[u][v]=temp; }}
How to program (2/3) - Fast algorithm -1
/***************************************************************************//*2D N*N DCT *//*Input *//*int argSourće[N][N] : One bloćk in the original image *//*Output */ /*float argDCT[N][N] : The bloćk in frequenćy domain ćorresponding to argSourće[M][N] *//***************************************************************************/void DCT(int argDCT[8][8] , int argSourće[8][8]){float temp[8][8],temp1;int i,j,k;
for(i=0;i<8;i++) for(j=0;j<8;j++) { temp[i][j] = 0.0; for(k=0;k<8;k++) temp[i][j] +=((int) argSourće[i][k]-128)*Ct[k][j]; }
for(i=0;u<8;u++) for(j=0;v<8;v++) { temp1=0.0; for(k=0;k<8;k++) temp1+ =C[i][k] * temp[k][j]; argDCT[i][j]=ROUND(temp1); }}
How to program (3/3) - Algorithm suitable for hardware implement
#include <stdio.h>#define RS(r,s) ((r) >> (s)) #define SCALE(exp) RS((exp),10)
void DCT(short int*input, short int*output){ short int jc, i, j, k; short int b[8]; short int b1[8]; short int d[8][8]; int c0=724;/* ; lect shift 10*/ int c1=502; int c2=474; int c3=426; int c4=362; int c5=284; int c6=196; int c7=100;
for (i = 0, k = 0; i < 8; i++, k += 8) { for (j = 0; j < 8; j++)
{ b[j] = input[k+j]; }
/* row transform */ for (j = 0; j < 4; j++)
{ jc = 7 - j; b1[j] = b[j] + b[jc]; b1[jc] = b[j] - b[jc]; }
b[0] = b1[0] + b1[3]; b[1] = b1[1] + b1[2]; b[2] = b1[1] - b1[2]; b[3] = b1[0] - b1[3]; b[4] = b1[4]; b[5] = SCALE((b1[6] - b1[5]) * ć0); b[6] = SCALE((b1[6] + b1[5]) * ć0); b[7] = b1[7]; d[i][0] = SCALE((b[0] + b[1]) * ć4); d[i][4] = SCALE((b[0] - b[1]) * ć4); d[i][2] = SCALE(b[2] * ć6 + b[3] * ć2); d[i][6] = SCALE(b[3] * ć6 - b[2] * ć2); b1[4] = b[4] + b[5]; b1[7] = b[7] + b[6]; b1[5] = b[4] - b[5]; b1[6] = b[7] - b[6]; d[i][1] = SCALE(b1[4] * ć7 + b1[7] * ć1); d[i][5] = SCALE(b1[5] * ć3 + b1[6] * ć5); d[i][7] = SCALE(b1[7] * ć7 - b1[4] * ć1); d[i][3] = SCALE(b1[6] * ć3 - b1[5] * ć5); } /* ćolumn transform */ for (i = 0; i < 8; i++) { for (j = 0; j < 4; j++) { jć = 7 - j; b1[j] = d[j][i] + d[jć][i]; b1[jć] = d[j][i] - d[jć][i]; }
b[0] = b1[0] + b1[3]; b[1] = b1[1] + b1[2]; b[2] = b1[1] - b1[2]; b[3] = b1[0] - b1[3]; b[4] = b1[4]; b[5] = SCALE((b1[6] - b1[5]) * c0); b[6] = SCALE((b1[6] + b1[5]) * c0); b[7] = b1[7]; d[0][i] = SCALE((b[0] + b[1]) * c4); d[4][i] = SCALE((b[0] - b[1]) * c4); d[2][i] = SCALE(b[2] * c6 + b[3] * c2); d[6][i] = SCALE(b[3] * c6 - b[2] * c2); b1[4] = b[4] + b[5]; b1[7] = b[7] + b[6]; b1[5] = b[4] - b[5]; b1[6] = b[7] - b[6]; d[1][i] = SCALE(b1[4] * c7 + b1[7] * c1); d[5][i] = SCALE(b1[5] * c3 + b1[6] * c5); d[7][i] = SCALE(b1[7] * c7 - b1[4] * c1); d[3][i] = SCALE(b1[6] * c3 - b1[5] * c5); } for (i = 0; i < 8; i++) { /* store 2-D array(8*8) data into a 1-D array (64)*/ for (j = 0; j < 8; j++) { *(output + i*8 + j) = (d[i][j]); } }}
Conclusion
DCT provides a new method to express an image with the properties of the image
The fast algorithm provided for hardware implement is possible.
Outline
Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards
Outline
What are motions in videos The importance of motions Motion representation How to find the motion of a block Block matching Residual Fast block matching algorithm Intra frame and inter frame
Motions in Video Clips
Local motions
Global motions
Background
The Importance of Motions
Compress one frame independently Each pixel has to be compressed.
DCT Quantization Binary coding
Compress one frame depending on the previous frame. Background can be ignored. Only compress moving objects and new objects
Example
1. Compress and in frame 1.
2. Compress the motion of in remaining frames.
Direction and magnitude
1 2 3 4
Motion Representation
Use arrows to represent motions of objects.
Global
Block-basedPixel-based
Region-based
How to Find The Motion of A Block?
Frame i-1 Frame i
Current frame(to be encoded)
Reference frame(existed)
Occlusion
matched
Motion vector
Block matching
Block Matching (1)
current
Block Matching (2)
Compare the difference between two blocks. (one is in the current frame, and the other is in the reference frame)
-
Current blockCandidate block
| |p
p = 1, sum of absolute differencep = 2, mean square error
Block Matching (3)
43 56 76 78 89 31 34 54
44 35 66 75 34 22 35 90
54 33 45 66 48 37 44 57
73 76 50 53 50 18 36 43
49 61 55 65 35 53 32 29
83 124 100 110 52 64 65 46
98 101 99 105 55 34 45 13
75 89 83 72 68 56 44 23
3454104100
6452110102
43346555
20515350
Measurement window is compared with a shifted array of pixels in the other frame, to determine the best match
Rectangular array of pixels is selected as a measurement window
Integer pixel shift
Block
Search range Minimum MSE
Residual (1)
motion Residualocclusion
Residual (2)
MotionCompensation
DCT + Q
iDCT + iQ
MV = (dx, dy)①
② ③
④
Residual
PreviousFrame Buffer
Encoder (DCT Quantization Binary coding)
Residual only
Residual (3)
Decoder
CodedBitstream VLD
1Q IDCT
MotionCompensation
PreviousFrame memory
Reconstructedframe
MV
Residual
Block Matching Algorithm - Full Search Method
15
15
Block Matching Algorithm - Three Step Method
Block Matching Algorithm - Four Step Method
IEEE Transation On Video Technology And Circuit System, June, 1996
Block Matching Algorithm - Diamond Method
Fractional pixel accuracy
Fractional pixel accuracy e.g. half-pixel accuracy
(dx, dy) = (1.5, 1) H.263, Foreman, QCIFSKIP=2, Q=4,5,7,10,15,25
Integer pixel
half pixel
Encode A Frame with Motions
Intra frame (I-frame) Encoded/decoded without using motion information.
Inter frame Encoded/decoded using motion information.
Prediction frame (P-frame) Bi-directional prediction frame (B-frame)
Group of pictures (GOP) Starting with an I-frame, followed a serious of inter frames.
Random access Prevention of error propagation
Intra Inter Inter Inter Inter Inter Intra...
GOP
I-Frame, P-Frame, and B-Frame
P-frame Find motions from the previous I- or P-frame.
B-frame Find motions from both previous and following I- and P-frame or P- and
P-frame. Some objects may be found only at the following frame.
Encoding order 1423756
I B B P PB B
1 2 3 4 5 6 7
Video Encoder
MotionCompensation
MotionEstmation
Framememory
DCT Q VLC
IDCT
Clipping
1Q
- Bitstream
Frame input
I-frameP-frame
residual
MV
MV
Previous frame
Video Decoder
CodedBitstream VLC
1Q IDCT
MotionCompensation
PreviousFrame memory
Reconstructedframe
I-frameP-frame
Outline
Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards
The Scope of Video Coding Standardization
Only restrictions on the Bitstream, Syntax, and Decoder are standardized: Permits the optimization of encoding Permits complexity reduction for implementability Provides no guarantees on quality
Standards and Applications
International Telecommunication Union – Telecommunication Standardization (ITU-T)
H.261 Videophone and video conferencing p x 64 kbps (p = 1 ... 30) Still in use
Low complexity, low latency Mostly as a backward-compatibility feature Overtaken by H.263
H.263 PSTN and mobile network: 10 to 24 kbps 1994: H.263, H.263+
H.264 Double the coding efficiency in comparison to any ot
her existing video coding standards
MPEG: Moving Picture Experts Group
MPEG-1: CD-i, (VOD trials), ... MPEG-2: ... + TV, HDTV MPEG-3: HDTV, merged into MPEG-2 MPEG-4: Coding of Audiovisual Objects MPEG-7: MM Description Interface MPEG-21: Digital Multimedia Framework
Chronological Table of Video Coding Standards
H.261
(1990)
MPEG-1
(1993)
H.263
(1995/96)
H.263+
(1997/98)
H.263++
(2000)
H.264
( MPEG-4
Part 10 )
(2002)MPEG-4 v1
(1998/99)MPEG-4 v2
(1999/00)MPEG-4 v3
(2001)
1990 1992 1994 1996 1998 2000 2002 2003
MPEG-2
(H.262)
(1994/95)ISO/IEC
MPEG
ITU-TVCEG
H.261: The Basis of Modern Video Compression
The first widespread practical success Video Format:
CIF (352 x 288, above 128Kbps) QCIF (176 x 144, 64 - 128 Kbps)
Operated at 64-2048 Kbps (p64Kbps) Still in use
Low complexity, low latency Mostly as a backward-compatibility feature Overtaken by H.263
MPEG-1: For Storage
Five parts: System, Visual, Audio, Conformance, Reference Software
Applications: VCD, VOD, Digital Camera Maximum: 1.856 mbps, 768x576 pels
Superior quality to H.261 when operated at higher bit rates (≥ 1 Mbps for CIF 352x288 resolution)
Provides approximately VHS quality between 1-2 Mbps using SIF 352x240/288 resolution
Technical features: Adds bi-directional motion prediction and half-pixel motion to H.261 design
Use is fairly widespread, but mostly overtaken by MPEG-2
MP3 = MPEG-1 layer 3 audio
MPEG-2 / H.262: High Bit Rate, High Quality
MPEG-2 Visual = H.262 Not especially useful below 2 Mbps (range of
use normally 2-20 Mbps) Applications: SDTV (2-5Mbps), DVD (6-
8Mbps), HDTV (20Mbps), VOD Support for interlaced scan pictures PSNR, temporal, and spatial scalability Consist of various “Profile” and “Level” MPEG-2 audio
Support 5.1 ćhannel MPEG-2 AAC: requires 30% fewer bits than MP3
H.263: The Next Generation
Goal: Improved quality at lower rates Has overtaken H.261 as dominant video-conferencing
codec Superior to H.261 at all bit rates Signifićantly better quality at lower rates
Better video at 18-24 Kbps than H.261 at 64 Kbps Enable video phone over regular phone lines (28.8 Kbps) o
r wireless modem H.263+ (1998): supports all bit rates, more options H.263++ (2000): more options, emphasizing on error
resilience and scalability
MPEG-4: H.263 + Additions + Variable Shape Coding
Goal: Support for interactive multimedia Visual Object (AO), Audio Object (AO) and AVO 18 video coding profiles Roughly follows H.263 design and adds all prior
features and (most important) shape coding Includes zero-tree wavelet coding of still textured
pictures, segmented coding of shapes, coding of synthetic content
2D & 3D mesh coding, face animation modeling 10-bit and 12-bit video Contains 9 parts. Part 10 will be H.264
A Note on Terminology of H.264
The following terms are used interchangeably: H.26L The Work of the JVT or “JVT CODEC” JM2.x, JM3.x, JM4.x The Thing Beyond H.26L The “AVC” or Advanced Video CODE
Proper Terminology going forward: MPEG-4 Part 10 (Official MPEG Term)
ISO/IEC 14496-10 AVC H.264 (Official ITU Term)
Position of H.264
New Features of H.264
Multi-mode, multi-reference MC Motion vector can point out of image border 1/4-, 1/8-pixel motion vector precision B-frame prediction weighting 44 integer transform Multi-mode intra-prediction In-loop de-blocking filter UVLC (Uniform Variable Length Coding) NAL (Network Abstraction Layer) SP-slices
Profiles and Levels
Profiles: Baseline, Main, and X Baseline: Progressive, Videoconferencing & Wireless Main: esp. Broadcast Extended: Mobile network
Baseline profile is the minimum implementation No CABAC, 1/8 MC, B-frame, SP-slices
15 levels Resolution, capability, bit rate, buffer, reference # Built to match popular international production and emission
formats From QCIF to D-Cinema
Variable Block Sizes
Various block sizes and shapes
8x8
0
4x8
0 10 1
2 3
4x48x4
1
08x8Types
0
16x16
0 1
8x16MB
Types
8x80 1
2 3
16x8
1
0
Multiple Reference Frames
Multiple reference frames Multiple reference frames
Frame to be encoded
Integer Transform (1)
4 4 and 2 2 Integer transform Integer transform matrix
1221
1111
2112
1111
1H 2
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
H
3
1 1
1 1H
0 1 4 52 3 6 78 9 12 13
10 11 14 15
18 1920 21
22 2324 25
-1 16 17
Y
Cb Cr
INTRA_16 16
DCT
H1
H2 H1 H1
Y
CrCb
In-loop De-blocking Filter
Without filter with H.264/AVC De-blocking
Highly compressed decoded inter picture Significantly reduces prediction residuals
Comparison
Summary Video coding is based on hybrid video coding and similar in
spirit to other standards but with important differences New key features are:
Enhanced motion compensation Small blocks for transform coding Improved de-blocking filter Enhanced entropy coding
Substantial bit-rate savings (up to 50%) relative to other standards for the same quality
Enhancement on perceptive quality seems better than that on PSNR
The complexity of the encoder triples that of the prior ones The complexity of the decoder doubles that of the prior ones
Applications Examples
Shopping “try-on” clothes. Decorate/furnish rooms.
Interact with sporting events. Multiple, simultaneous
views. Player/game statistics. User-directed replay,
freeze, etc. Field maintenance
Mobile audio-visual terminal Remote audio-visual
access.
Security monitoring Region-of-interest (ex :
face) isolation, enhancement.
Auto traffic, harbor traffic management.
Networked video games. Actual users as player in
the scene
Sign language
From Olivier Avaro
Reference Software
H.264 http://iphome.hhi.de/suehring/tml/
MPEG-4 http://www.xvid.org