video coding

43
Video coding [??]

Upload: aglaia

Post on 19-Mar-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Video coding. [??]. Video coding. Types of redundancies: – Spatial: Correlation between neighboring pixel values – Spectral: Correlation between different color planes or spectral bands – Temporal: Correlation between different frames in a video sequence - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Video coding

Video coding

[??]

Page 2: Video coding

Video coding

Types of redundancies:– Spatial: Correlation between neighboring pixel values– Spectral: Correlation between different color planes or spectral bands– Temporal: Correlation between different frames in a video sequenceIn video coding, temporal correlation is also exploited, typically using motion compensation (a predictive coding based on motion estimation)

Page 3: Video coding

Video standards review

Page 4: Video coding

H.261For video-conferencing/video phone– Low delay (real-time, interactive)– Slow motion in general• For transmission over ISDN– Fixed bandwidth: px64 Kbps, p=1,2,…,30

Page 5: Video coding

H.261• Video Format:– CIF (352x288, above 128 Kbps)– QCIF (176x144, 64-128 Kbps)– 4:2:0 color format, progressive scan• Published in 1990• Each macroblock can be coded in intra- or inter-mode• Periodic insertion of intra-mode to eliminate error propagation due to network impairments

Page 6: Video coding

DCT coefficient quantization

DC Coefficient in Intra-mode:UniformOthers:Uniform with deadzone (to avoid too many small coefficients being coded, which are typically due to noise)

MVs coded differentially (DMV)DCT coefficients are converted into runlength representations and then coded using VLC (Huffman coding for each pair of symbols)– Symbol: (Zero run-length, non-zero value range)• Other information is also coded using VLC (Huffman coding)

Page 7: Video coding

MPEG-1

• Finalized in ~1991• Audio/video on CD-ROM (1.5 Mbps, CIF: 352x240, 30 fps).– Maximum: 1.856 mbps, 768x576 pels– Progressive frames only• Prompted explosion of digital video applications: MPEG1 video CD and downloadable video over Internet• Software only decoding, made possible by the introduction of Pentium chips, key to the success in the commercial market• MPEG-1 Audio– Offers 3 coding options (3 layers), higher layers have higher coding efficiency with more computations– MP3 = MPEG1 layer 3 audio

Page 8: Video coding

MPEG-1 vs H.261• Developed at about the same time• Must enable random access (Fast forward/rewind) – Using GOP structure with periodic I-picture and P-picture• Not for interactive applications – Does not have as stringent delay requirement• Fixed rate (1.5 Mbps), good quality (VHS equivalent) – SIF video format (similar to CIF) • CIF: 352x288, SIF: 352x240 – Using more advanced motion compensation • Half-pel accuracy motion estimation, range up to +/- 64 – Using bi-directional temporal prediction • Important for handling uncovered regions

Page 9: Video coding

MPEG-1 GOP

Encoding order: 1 4 2 3 8 5 6 7

Page 10: Video coding

MPEG-1 coder

Page 11: Video coding

H.263• Targeted for visual telephone over PSTN or Internet• Enable video phone over regular phone lines (28.8 Kbps) orwireless modem• Developed later than H.261, can accommodate computationallymore intensive options– Initial version (H.263 baseline): 1995– H.263+: 1997– H.263++: 2000• Result: Significantly better quality at lower rates– Better video at 18-24 Kbps than H.261 at 64 Kbps

Page 12: Video coding

H.263

Page 13: Video coding

(some of the ) H.263 improvements over H.261• Better motion estimation– half-pel accuracy motion estimation with bilinear interpolation filter– larger motion search range [-31.5,31], and unrestricted MV at boundary blocks– more efficient predictive coding for MVs (median prediction using three neighbors)– overlapping block motion compensation (option)– variable block size: 16x16 -> 8x8, 4 MVs per MB (option)– use bidirectional temporal prediction (PB picture) (option)• 3-D VLC for DCT coefficients (runlength, value, EOB)• Syntax-based arithmetic coding (option; at 50% more computations)

Page 14: Video coding

H.263 and beyond- Aimed particularly at video coding for low bit rates (typically 20-30 Kbps and above).- Similar to that used by H.261, however with some improvements and changes to improve performance and error recovery. - Main differences: - Half pixel precision is used for motion compensation - Four optional negotiable options - Unrestricted Motion Vectors - Syntax-based arithmetic coding, - Advance prediction, and - forward and backward frame prediction (similar to MPEG called P-B frames) - Five resolutions instead of twoFurther improvements in H.263+ and H.264

Page 16: Video coding

MPEG-2

MPEG-2: finalized in 1994» Field-interlaced video» Levels and profiles• Profiles: Define bit stream scalability and color space resolutions• Levels: Define image resolutions and maximum bit-rate per profile

Page 17: Video coding

MPEG-2

• A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed Inter/Intranet) as well as DVD video• 4~8 Mbps for TV quality, 10-15 for better quality at SDTV resolutions (BT.601)• 18-45 Mbps for HDTV applications – MPEG-2 video high profile at high level is the video coding standard used in HDTV• Test in 11/91, Committee Draft 11/93• Consist of various profiles and levels• Backward compatible with MPEG1• MPEG-2 Audio – Support 5.1 channel – MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3

Page 18: Video coding

MPEG-2 vs MPEG-1

• MPEG1 only handles progressive sequences (SIF).• MPEG2 is targeted primarily at interlaced sequences and at

higher resolution (BT.601 = 4CIF).• More sophisticated motion estimation methods (frame/field

prediction mode) are developed to improve estimation accuracy for interlaced sequences.

- Frame Motion Vectors: one motion vector is generated per MB in each direction, which corresponds to a 16x16 pels luminance area.

- Field Motion Vectors: two motion vectors per MB is generated for each direction, one for each of the fields. Each vector corresponds to a 16x8 pels luminance area.

• Different DCT modes and scanning methods are developed for interlaced sequences.

• MPEG2 has various scalability modes.• MPEG2 has various profiles and levels, each combination

targeted for different application

Page 19: Video coding

MPEG-2 scalability

• Data partition – All headers, MVs, first few DCT coefficients in the base layer – Can be implemented at the bit stream level – Simple• SNR scalability – Base layer includes coarsely quantized DCT coefficients – Enhancement layer further quantizes the base layer quantization error – Relatively simple• Spatial scalability – Complex• Temporal scalability – Simple

Page 20: Video coding

SNR scalability

Page 21: Video coding

Spatial scalability

Page 22: Video coding

temporal scalability

or

Page 23: Video coding

MPEG-2 profiles and levels

Profiles: toolsLevels: parameter

range for a given profile

Main profile at main level (mp@ml) is the most popular, used for digital TV

Main profile at high level (mp@hl): HDTV

4:2:2 at main level (4:2:2@ml) is used for studio production

Page 24: Video coding

MPEG-4

New features» Provides technologies to view access and manipulate objects

rather than pixels» Entire scene is decomposed into multiple objects – Object segmentation is the most difficult task! – But this does not need to be standardized ☺» Each object is specified by its shape, motion, and texture (color) - Shape and texture both changes in time (specified by motion) - Texture encoding is done with DCT (8x8 pixel blocks) or

Wavelets » MPEG-4 assumes the encoder has a segmentation map

available, specifies how to code (actually decode!) shape, motion and texture

Page 25: Video coding

MPEG-4

Page 26: Video coding

Example of Scene Composition

Page 27: Video coding

Object-Based Coding

Page 28: Video coding

MPEG-4

MPEG-4 block diagram

Page 29: Video coding

MPEG-4

MPEG-4– Coding Tools » Shape coding: Binary or Gray Scale » Motion Compensation: Similar to H.263, Overlapped mode is

supported » Texture Coding: Block-based DCT and Wavelets for Static Texture– Type of Video Object Planes (VOPs) » I-VOP: VOP is encoded independently of any other VOPs » P-VOP: Predicted VOP using another previous VOP and motion

compensation » B-VOP: Bidirectional Interpolated VOP using other I-VOPs or P-

VOPs » Similar concept to MPEG-2

Page 30: Video coding

Mesh Animation

• An object can be described by an initial mesh and MVs of the nodes in the following frames

• MPEG-4 defines coding of mesh geometry, but not mesh generation

Page 31: Video coding

Body and Face Animation

• MPEG-4 defines a default 3-D body model (including its geometry and possible motion) through body definition table (BDP)

• The body can be animated using the body animation parameters (BAP)

• Similarly, face definition table (FDP) and face animation parameters (FAP) are specified for a face model and its animation

• E.g. eye blink (FAP19)

Page 32: Video coding

Text-to-Speech Synthesis with Face Animation

Page 33: Video coding

Others…

• Sprite – Code a large background in the beginning of the sequence, plus affine

mappings, which map parts of the background to the displayed scene at different time instances

– Decoder can vary the mapping to zoom in/out, pan left/right• Global motion compensation – Using 8-parameter projective mapping – Effective for sequences with large global motion• Quarter-pixel motion estimation

• DivX: - based on MPEG-4 - can reduce an MPEG-2 video (the same format used for DVD and pay per view)

to 10 percent of its original size (so that a DVD can be recorded on a CD) - audio is normally coded using MP3

Page 34: Video coding

MPEG-7 MPEG-1/2/4 make content available, whereas MPEG-7 allows

you to find the content you need!– A content description standard » Video/images: Shape, size, texture, color, movements and

positions, etc… » Audio: Key, mood, tempo, changes, position in sound space,

etc…– Applications: » Digital Libraries » Multimedia Directory Services » Broadcast Media Selection » Editing, etc…Example: Draw an object and be able to find object with similar

characteristics. Play a note of music and be able to find similar type of music

Page 35: Video coding

MPEG-21

Aims at standardizing interfaces and tools to facilitate the exchange of multimedia resources across heterogeneous devices, networks and users.

More specifically, it standardizes requisite elements for packaging, identifying, adapting and processing these resources as well as managing their usage rights.

This framework will benefit the entire consumption chain from creators and rights holders to service providers and consumers.

Basic unit of transaction in the MPEG-21 Multimedia Framework: the Digital Item, which packages resources along with identifiers, metadata, licenses and methods that enable interaction with the Digital Item.

Another key concept : the User, i.e. any entity that interacts in the MPEG-21 environment or makes use of Digital Items.

Page 36: Video coding

MPEG-21

MPEG-21 can be seen as providing a framework in which one User interacts with another User and the object of that interaction is a Digital Item.

Some example interactions include content creation, management, protection, archiving, adaptation, delivery and consumption.

Page 37: Video coding

MPEG-A

MPEG’s Multimedia Application Formats (MAF) provide the framework for integration of elements from several MPEG standards into a single specification that is suitable for specific, but widely usable applications.

Typically, MAFs specify how to combine metadata with timed media information for a presentation in a well-defined format that facilitates interchange, management, editing, and presentation of the media. The presentation may be ‘local’ to the system or may be via a network or other stream delivery mechanism. 

Page 38: Video coding

MPEG-A

MAF specifications shall integrate elements from different MPEG standards into a single specification that is useful for specific but very widely used applications. Examples are delivering music, pictures or home videos. MAF specifications may use elements from MPEG-1, MPEG-2, MPEG-4, MPEG-7 and MPEG-21. Typically, MAF specifications include:

- The ISO File Format family for storage - A simple MPEG-7 tool set for Metadata - One or more coding Profiles for representing the Media - Tools for encoding metadata in either binary or XML form

Page 39: Video coding

MPEG-A

MAFs may specify use of: - MPEG-21 Digital Item Declaration Language for representing

the Structure of the Media and the Metadata - Other MPEG-21 tools - non-MPEG coding tools (e.g., JPEG) for representation of

"non-MPEG" media - Elements from non-MPEG standards that are required to

achieve full interoperability

Page 40: Video coding

MPEG-A: 2 examples

3on4: - MP3, is one of the most widely used MPEG standards. 

Currently, the ID3 simply appends simple metadata tags such as Artist, Album, Song Title, etc.  

-MPEG-4 specifies what MPEG expects to be another very successful specification, the MPEG-4 File Format, while MPEG-7 specifies not only signal-derived meta-data, but also archival meta-data such as Artist, Album and Song Title.

- As such, MPEG-4 and MPEG-7 represent an ideal environment to support the current “MP3 music library” user experience, and, moreover, to extend that experience in new directions.

Page 41: Video coding

MPEG-A: 2 examples Jon4 - Digital Cameras -> library with thousands of digital photos

- Search for photographs of interest can be difficult -> - Need for provision of suitable metadata: photo content (e.g. the subject being photographed), author, shoot location, imaging parameters, etc, stored in a standardized format

- The EXIF standard (commonly adopted by camera manufacturers) does not support advanced metadata.

MPEG-7 defines rich metadata descriptions for still images, audio and also provides associated systems tools (file formats, etc)

As such, MPEG-7 and MPEG-4 file format represent an ideal environment to support the current “Digital Photos Library” user experience

Page 42: Video coding

Summary (1/2)• H.261:– First video coding standard, targeted for video conf. over ISDN– Uses block-based hybrid coding framework with integer-pel MC• H.263, H.264…– Improved quality at lower bit rate, to enable video

conferencing/telephony below 54 Kbps (modems or internet access, desktop conferencing); half-pixel MC

• MPEG-1 video– Video on CD and video on the Internet (good quality at 1.5

Mbps)– Half-pixel MC and bidirectional MC• MPEG-2 video– TV/HDTV/DVD (4-15 Mbps)– Extended from MPEG-1, considering interlaced video

Page 43: Video coding

Summary (2/2)

• MPEG-4– To enable object manipulation and scene composition at the decoder -

> interactive TV/virtual reality– Object-based video coding: shape coding– Coding of synthetic video and audio: animation• MPEG-7– To enable search and browsing of multimedia documents– Defines the syntax for describing the structural and conceptual content• MPEG-21: beyond MPEG-7, considering intellectual property

protection, etc.• MPEG-A: integration of elements from different MPEG standards into

a single specification that is useful for specific but very widely used applications