outline introduction on multimedia coding motion estimation discrete cosine transform video coding...

Outline

Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards

Multimedia Concepts

What is multimedia? Combination of audio, video, image,

graphic, and text. Coverage of all human I/O’s.

Why does multimedia need to be coded?

Multimedia Coding for Different Applications

Mobile devices Low data-rate, error resilience, scalability

Streaming service Scalability, low to medium data-range,

interactivity On-disk distribution (DVD)

Interactivity Broadcast

On-demand services

System Architecture

Compression LayerStreams from as low as bps to Mbps

System LayerManages Elementary Streams, their

synchronization and hierarchical relations

Delivery LayerProvides transparent access and delivery of content irrespective of

delivery technologies

Media awareDelivery unaware

Media awareDelivery aware

Media unawareDelivery aware

Coding of Audiovisual Objects

Audiovisual scene is with “objects” Mixed different objects on the screen Visual

Video Animated face & body; 2D and 3D animated meshes Text and Graphics

Audio General audio – mono, stereo, and multichannel Speech Synthetic sounds (“Structured audio”) Environmental spatialization

Example of MPEG-4 Video Objects

Arbitrary shapevideo objectRectangular shape

video object

Animated Face

From Olivier Avaro

The Scene Tree

1. Composition2. Description &

Synchronization3. Delivery of streaming data4. Interaction with media

objects5. Management and

identification of intellectual property

Major Components

CompositionRendering

Media Objects

Scene Graph

Adding or Removing Objects (1)

–

+

=


From Igor S. Pandž�ić�


Applications Video conferencing

Real-time, automatic Separate foreground (communication partner)

from background Object tracking in video

May allow off-line and semi-automatic Separate moving object from others

Coding Techniques

Video objects Shape Motion vectors texture

Audio objects MPEG AAC (Advanced Audio Coder) TTS (Text-To-Speech)

Face and Body Animation parameters

2D Mesh Triangular patches Motion vector

Encoding of Visual Objects

Binary alpha block Motion vector Context-based

arithmetic encoding

Texture Motion vector DCT

Natural Audio Coder

Quality

Cellular

Telephone

AM

FM

CD

2 4 8 16 32 64 kbit/s

Parametric speech(HVXC)

High quality speech(CELP)

General audio(AAC, TwinVQ)

Parametric audio(HILN)

From Olivier Dećhažal

Facial Animation

From Eine Übersićht

Object Mesh

Useful for animation, content manipulation, content overlay, merging natural and synthetic video...

Tessellate with triangular paths

Sprite Coding

Represent background image with a larger size than that of image.

Useful for camera motion

Multiview Video

Outline

Introduction on Multimedia Coding Discrete Cosine Transform Motion Estimation Video Coding Standards

Outline

What Is DCT And Why Use DCT How to Compute DCT Program The DCT Conclusion

An Image-Transform Coding System

Forward transform

Quantizer

Binary encoder

Inverse transform

Inverse quantizer

Binary decoder

Network

Input samples

Output samples

575 205 215 140

355 155 105 20

150 200 65 25

100 70 30 10

57 20 21 14

35 15 10 2

15 20 6 2

10 7 3 1

÷10

57 20 21 14

35 15 10 2

15 20 6 2

10 7 3 1

570 205 210 140

350 150 100 20

150 200 60 20

100 70 30 10

×10

e.g. zip, RARHuffman coding

Introduction(1/5) – Representation of An Image

How to code an image ?1. Spatial domain (pixel-based)

2. Transform domain Transformation methods

KLT , DFT , DWT , DCT...

Use cosine function as its basis function

Performance approaches KLT Fast algorithm exists Most popular in image compression

application Adopted in JPEG, M-JPEG, MPEG,

H.26x

Introduction(2/5) – Why Use DCT? Properties of DCT

Introduction (3/5) - Does Transform Really Make Sense ?

Energy compaction De-correlation: dependency elimination

Introduction (4/5) - Examples

8

8

139 148 150 149 155 164 165 168

98 115 130 135 143 146 142 147

89 110 125 128 129 121 104 106

96 116 128 132 134 132 113 109

111 125 127 131 137 137 120 110

122 126 126 131 133 131 126 112

133 134 136 138 140 144 141 139

138 139 139 139 140 146 148 147

DCT

IDCT

Pixel values in spatial domain

DCT coefficients in transform domain

A pixel expressed by it’s value The coefficient of the basis vector (0,0)

Introduction (5/5) - Examples

Definition of Basis Function

Basis function of the 1-D N-point DCT

For N = 8

1,...,1,0,2

)12(cos)(;

Nn

N

knku nk

1,...,2,12

01

)(

NkN

kNk

7,...,1,0,16

)12(cos)(;

n

knku nk

Basic diagram of DCT

N

knsksut n

N

nnnk

N

nk 2

)12(cos)(

1

0

*;

1

0

1,...,1,0,2

)12(cos)(

1

0;

1

0

Nn

N

kntktus k

N

nknk

N

nn

Discrete cosine transform and Inverse DCT

(1)

(2)

The basis of 2D-DCT with 8x8 block

Again – Do You Know What DCT Mean?

DCT

IDCT

Pixel values in spatial domain

DCT coefficients in transform domain

A pixel expressed by it’s value The coefficient of the basis vector (0,0)

How to Compute: 1-D VS. 2-D [1-D] For a M × N 2D-block, we can

use 1D N-point DCT in the row direction, then the 1-D M-point DCT in the column direction to get the 2D-DCT

[2-D] If 8 × 8 blocks are applied, the 2D-DCT will be

DCT matrix is orthonormal

)]16/)()12cos(()16/)()12[cos(()2/1(

)16/)12cos(()16/)12cos((

70

70

vuivui

viui

i

i

The above equation is zero if u≠vorthorgonal

The basis vector of DCT has unit norm According the above two , we know DCT matrix is orthonormal The same is applied to 2D-DCT

7,...,1,0,16

)12(cos)(;

i

uiku ui

Properties of Orthonormal

Energy can be conservation

Transform matrix can be refractorseparable

Energy conservation of orthonormal transform

Separable Transform (1/2)

Separable Transform (2/2)

Fast DCT algorithm (1/2)

Fast DCT algorithm (2/2)

How to program (1/3) - Normal form

/***************************************************************************//*2D N*N DCT *//*Input *//*int argSourće[N][N] ： One bloćk in the original image *//*Output *//*float argDCT[N][N] ： The bloćk in frequenćy domain ćorresponding to argSourće[M][N] *//***************************************************************************/void DCT(int argDCT[8][8] , int argSourće[8][8]){float C[8],Cos[8][8];float temp;int i,j,u,v;

for(i=0;i<8;i++) for(j=0;j<8;j++) Cos[i][j]=ćos((2*i+1)*j*PI/16);

C[0]=0.35355339; for(i=1;i<8;i++) C[i]=0.5;

for(u=0;u<8;u++) for(v=0;v<8;v++) { temp=0.0; for(i=0;i<8;i++) for(j=0;j<8;j++) temp+=Cos[i][u]*Cos[j][v]*(argSourće[i][j]-128); temp*=C[u]*C[v]; argDCT[u][v]=temp; }}

How to program (2/3) - Fast algorithm -1

/***************************************************************************//*2D N*N DCT *//*Input *//*int argSourće[N][N] ： One bloćk in the original image *//*Output */ /*float argDCT[N][N] ： The bloćk in frequenćy domain ćorresponding to argSourće[M][N] *//***************************************************************************/void DCT(int argDCT[8][8] , int argSourće[8][8]){float temp[8][8],temp1;int i,j,k;

for(i=0;i<8;i++) for(j=0;j<8;j++) { temp[i][j] = 0.0; for(k=0;k<8;k++) temp[i][j] +=((int) argSourće[i][k]-128)*Ct[k][j]; }

for(i=0;u<8;u++) for(j=0;v<8;v++) { temp1=0.0; for(k=0;k<8;k++) temp1+ =C[i][k] * temp[k][j]; argDCT[i][j]=ROUND(temp1); }}

How to program (3/3) - Algorithm suitable for hardware implement

#include <stdio.h>#define RS(r,s) ((r) >> (s)) #define SCALE(exp) RS((exp),10)

void DCT(short int*input, short int*output){ short int jc, i, j, k; short int b[8]; short int b1[8]; short int d[8][8]; int c0=724;/* ; lect shift 10*/ int c1=502; int c2=474; int c3=426; int c4=362; int c5=284; int c6=196; int c7=100;

for (i = 0, k = 0; i < 8; i++, k += 8) { for (j = 0; j < 8; j++)

{ b[j] = input[k+j]; }

/* row transform */ for (j = 0; j < 4; j++)

{ jc = 7 - j; b1[j] = b[j] + b[jc]; b1[jc] = b[j] - b[jc]; }

b[0] = b1[0] + b1[3]; b[1] = b1[1] + b1[2]; b[2] = b1[1] - b1[2]; b[3] = b1[0] - b1[3]; b[4] = b1[4]; b[5] = SCALE((b1[6] - b1[5]) * ć0); b[6] = SCALE((b1[6] + b1[5]) * ć0); b[7] = b1[7]; d[i][0] = SCALE((b[0] + b[1]) * ć4); d[i][4] = SCALE((b[0] - b[1]) * ć4); d[i][2] = SCALE(b[2] * ć6 + b[3] * ć2); d[i][6] = SCALE(b[3] * ć6 - b[2] * ć2); b1[4] = b[4] + b[5]; b1[7] = b[7] + b[6]; b1[5] = b[4] - b[5]; b1[6] = b[7] - b[6]; d[i][1] = SCALE(b1[4] * ć7 + b1[7] * ć1); d[i][5] = SCALE(b1[5] * ć3 + b1[6] * ć5); d[i][7] = SCALE(b1[7] * ć7 - b1[4] * ć1); d[i][3] = SCALE(b1[6] * ć3 - b1[5] * ć5); } /* ćolumn transform */ for (i = 0; i < 8; i++) { for (j = 0; j < 4; j++) { jć = 7 - j; b1[j] = d[j][i] + d[jć][i]; b1[jć] = d[j][i] - d[jć][i]; }

b[0] = b1[0] + b1[3]; b[1] = b1[1] + b1[2]; b[2] = b1[1] - b1[2]; b[3] = b1[0] - b1[3]; b[4] = b1[4]; b[5] = SCALE((b1[6] - b1[5]) * c0); b[6] = SCALE((b1[6] + b1[5]) * c0); b[7] = b1[7]; d[0][i] = SCALE((b[0] + b[1]) * c4); d[4][i] = SCALE((b[0] - b[1]) * c4); d[2][i] = SCALE(b[2] * c6 + b[3] * c2); d[6][i] = SCALE(b[3] * c6 - b[2] * c2); b1[4] = b[4] + b[5]; b1[7] = b[7] + b[6]; b1[5] = b[4] - b[5]; b1[6] = b[7] - b[6]; d[1][i] = SCALE(b1[4] * c7 + b1[7] * c1); d[5][i] = SCALE(b1[5] * c3 + b1[6] * c5); d[7][i] = SCALE(b1[7] * c7 - b1[4] * c1); d[3][i] = SCALE(b1[6] * c3 - b1[5] * c5); } for (i = 0; i < 8; i++) { /* store 2-D array(8*8) data into a 1-D array (64)*/ for (j = 0; j < 8; j++) { *(output + i*8 + j) = (d[i][j]); } }}

Conclusion

DCT provides a new method to express an image with the properties of the image

The fast algorithm provided for hardware implement is possible.

Outline


Outline

What are motions in videos The importance of motions Motion representation How to find the motion of a block Block matching Residual Fast block matching algorithm Intra frame and inter frame

Motions in Video Clips

Local motions

Global motions

Background

The Importance of Motions

Compress one frame independently Each pixel has to be compressed.

DCT Quantization Binary coding

Compress one frame depending on the previous frame. Background can be ignored. Only compress moving objects and new objects

Example

1. Compress and in frame 1.

2. Compress the motion of in remaining frames.

Direction and magnitude

1 2 3 4

Motion Representation

Use arrows to represent motions of objects.

Global

Block-basedPixel-based

Region-based

How to Find The Motion of A Block?

Frame i-1 Frame i

Current frame(to be encoded)

Reference frame(existed)

Occlusion

matched

Motion vector

Block matching

Block Matching (1)

current

Block Matching (2)

Compare the difference between two blocks. (one is in the current frame, and the other is in the reference frame)

-

Current blockCandidate block

| |p

p = 1, sum of absolute differencep = 2, mean square error

Block Matching (3)

43 56 76 78 89 31 34 54

44 35 66 75 34 22 35 90

54 33 45 66 48 37 44 57

73 76 50 53 50 18 36 43

49 61 55 65 35 53 32 29

83 124 100 110 52 64 65 46

98 101 99 105 55 34 45 13

75 89 83 72 68 56 44 23

3454104100

6452110102

43346555

20515350

Measurement window is compared with a shifted array of pixels in the other frame, to determine the best match

Rectangular array of pixels is selected as a measurement window

Integer pixel shift

Block

Search range Minimum MSE

Residual (1)

motion Residualocclusion

Residual (2)

MotionCompensation

DCT + Q

iDCT + iQ

MV = (dx, dy)①

② ③

④

Residual

PreviousFrame Buffer

Encoder (DCT Quantization Binary coding)

Residual only

Residual (3)

Decoder

CodedBitstream VLD

1Q IDCT

MotionCompensation

PreviousFrame memory

Reconstructedframe

MV

Residual

Block Matching Algorithm - Full Search Method

15

15

Block Matching Algorithm - Three Step Method

Block Matching Algorithm - Four Step Method

IEEE Transation On Video Technology And Circuit System, June, 1996

Block Matching Algorithm - Diamond Method

Fractional pixel accuracy

Fractional pixel accuracy e.g. half-pixel accuracy

(dx, dy) = (1.5, 1) H.263, Foreman, QCIFSKIP=2, Q=4,5,7,10,15,25

Integer pixel

half pixel

Encode A Frame with Motions

Intra frame (I-frame) Encoded/decoded without using motion information.

Inter frame Encoded/decoded using motion information.

Prediction frame (P-frame) Bi-directional prediction frame (B-frame)

Group of pictures (GOP) Starting with an I-frame, followed a serious of inter frames.

Random access Prevention of error propagation

Intra Inter Inter Inter Inter Inter Intra...

GOP

I-Frame, P-Frame, and B-Frame

P-frame Find motions from the previous I- or P-frame.

B-frame Find motions from both previous and following I- and P-frame or P- and

P-frame. Some objects may be found only at the following frame.

Encoding order 1423756

I B B P PB B

1 2 3 4 5 6 7

Video Encoder

MotionCompensation

MotionEstmation

Framememory

DCT Q VLC

IDCT

Clipping

1Q

- Bitstream

Frame input

I-frameP-frame

residual

MV

MV

Previous frame

Video Decoder

CodedBitstream VLC

1Q IDCT

MotionCompensation

PreviousFrame memory

Reconstructedframe

I-frameP-frame

Outline


The Scope of Video Coding Standardization

Only restrictions on the Bitstream, Syntax, and Decoder are standardized: Permits the optimization of encoding Permits complexity reduction for implementability Provides no guarantees on quality

Standards and Applications

International Telecommunication Union – Telecommunication Standardization (ITU-T)

H.261 Videophone and video conferencing p x 64 kbps (p = 1 ... 30) Still in use

Low complexity, low latency Mostly as a backward-compatibility feature Overtaken by H.263

H.263 PSTN and mobile network: 10 to 24 kbps 1994: H.263, H.263+

H.264 Double the coding efficiency in comparison to any ot

her existing video coding standards

MPEG: Moving Picture Experts Group

MPEG-1: CD-i, (VOD trials), ... MPEG-2: ... + TV, HDTV MPEG-3: HDTV, merged into MPEG-2 MPEG-4: Coding of Audiovisual Objects MPEG-7: MM Description Interface MPEG-21: Digital Multimedia Framework

Chronological Table of Video Coding Standards

H.261

(1990)

MPEG-1

(1993)

H.263

(1995/96)

H.263+

(1997/98)

H.263++

(2000)

H.264

( MPEG-4

Part 10 )

(2002)MPEG-4 v1

(1998/99)MPEG-4 v2

(1999/00)MPEG-4 v3

(2001)

1990 1992 1994 1996 1998 2000 2002 2003

MPEG-2

(H.262)

(1994/95)ISO/IEC

MPEG

ITU-TVCEG

H.261: The Basis of Modern Video Compression

The first widespread practical success Video Format:

CIF (352 x 288, above 128Kbps) QCIF (176 x 144, 64 - 128 Kbps)

Operated at 64-2048 Kbps (p64Kbps) Still in use

Low complexity, low latency Mostly as a backward-compatibility feature Overtaken by H.263

MPEG-1: For Storage

Five parts: System, Visual, Audio, Conformance, Reference Software

Applications: VCD, VOD, Digital Camera Maximum: 1.856 mbps, 768x576 pels

Superior quality to H.261 when operated at higher bit rates (≥ 1 Mbps for CIF 352x288 resolution)

Provides approximately VHS quality between 1-2 Mbps using SIF 352x240/288 resolution

Technical features: Adds bi-directional motion prediction and half-pixel motion to H.261 design

Use is fairly widespread, but mostly overtaken by MPEG-2

MP3 = MPEG-1 layer 3 audio

MPEG-2 / H.262: High Bit Rate, High Quality

MPEG-2 Visual = H.262 Not especially useful below 2 Mbps (range of

use normally 2-20 Mbps) Applications: SDTV (2-5Mbps), DVD (6-

8Mbps), HDTV (20Mbps), VOD Support for interlaced scan pictures PSNR, temporal, and spatial scalability Consist of various “Profile” and “Level” MPEG-2 audio

Support 5.1 ćhannel MPEG-2 AAC: requires 30% fewer bits than MP3

H.263: The Next Generation

Goal: Improved quality at lower rates Has overtaken H.261 as dominant video-conferencing

codec Superior to H.261 at all bit rates Signifićantly better quality at lower rates

Better video at 18-24 Kbps than H.261 at 64 Kbps Enable video phone over regular phone lines (28.8 Kbps) o

r wireless modem H.263+ (1998): supports all bit rates, more options H.263++ (2000): more options, emphasizing on error

resilience and scalability

MPEG-4: H.263 + Additions + Variable Shape Coding

Goal: Support for interactive multimedia Visual Object (AO), Audio Object (AO) and AVO 18 video coding profiles Roughly follows H.263 design and adds all prior

features and (most important) shape coding Includes zero-tree wavelet coding of still textured

pictures, segmented coding of shapes, coding of synthetic content

2D & 3D mesh coding, face animation modeling 10-bit and 12-bit video Contains 9 parts. Part 10 will be H.264

A Note on Terminology of H.264

The following terms are used interchangeably: H.26L The Work of the JVT or “JVT CODEC” JM2.x, JM3.x, JM4.x The Thing Beyond H.26L The “AVC” or Advanced Video CODE

Proper Terminology going forward: MPEG-4 Part 10 (Official MPEG Term)

ISO/IEC 14496-10 AVC H.264 (Official ITU Term)

Position of H.264

New Features of H.264

Multi-mode, multi-reference MC Motion vector can point out of image border 1/4-, 1/8-pixel motion vector precision B-frame prediction weighting 44 integer transform Multi-mode intra-prediction In-loop de-blocking filter UVLC (Uniform Variable Length Coding) NAL (Network Abstraction Layer) SP-slices

Profiles and Levels

Profiles: Baseline, Main, and X Baseline: Progressive, Videoconferencing & Wireless Main: esp. Broadcast Extended: Mobile network

Baseline profile is the minimum implementation No CABAC, 1/8 MC, B-frame, SP-slices

15 levels Resolution, capability, bit rate, buffer, reference # Built to match popular international production and emission

formats From QCIF to D-Cinema

Variable Block Sizes

Various block sizes and shapes

8x8

0

4x8

0 10 1

2 3

4x48x4

1

08x8Types

0

16x16

0 1

8x16MB

Types

8x80 1

2 3

16x8

1

0

Multiple Reference Frames

Multiple reference frames Multiple reference frames

Frame to be encoded

Integer Transform (1)

4 4 and 2 2 Integer transform Integer transform matrix

1221

1111

2112

1111

1H 2

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

H

3

1 1

1 1H

0 1 4 52 3 6 78 9 12 13

10 11 14 15

18 1920 21

22 2324 25

-1 16 17

Y

Cb Cr

INTRA_16 16

DCT

H1

H2 H1 H1

Y

CrCb

In-loop De-blocking Filter

Without filter with H.264/AVC De-blocking

Highly compressed decoded inter picture Significantly reduces prediction residuals

Comparison

Summary Video coding is based on hybrid video coding and similar in

spirit to other standards but with important differences New key features are:

Enhanced motion compensation Small blocks for transform coding Improved de-blocking filter Enhanced entropy coding

Substantial bit-rate savings (up to 50%) relative to other standards for the same quality

Enhancement on perceptive quality seems better than that on PSNR

The complexity of the encoder triples that of the prior ones The complexity of the decoder doubles that of the prior ones

Applications Examples

Shopping “try-on” clothes. Decorate/furnish rooms.

Interact with sporting events. Multiple, simultaneous

views. Player/game statistics. User-directed replay,

freeze, etc. Field maintenance

Mobile audio-visual terminal Remote audio-visual

access.

Security monitoring Region-of-interest (ex :

face) isolation, enhancement.

Auto traffic, harbor traffic management.

Networked video games. Actual users as player in

the scene

Sign language

From Olivier Avaro

Reference Software

H.264 http://iphome.hhi.de/suehring/tml/

MPEG-4 http://www.xvid.org

outline introduction on multimedia coding motion estimation discrete cosine transform video coding...

Documents

dctconclusionan image

multimedia need

image compression application

combination of audio

delivery of content

objectsmixed different

content manipulation

content overlay