mpeg-4 & mheg-5 (uk) aleksi lindblad mika linnanoja marko luukkainen zhenbo zhang 22.11.2005

60
MPEG-4 & MHEG-5 (UK) Aleksi Lindblad Mika Linnanoja Marko Luukkainen Zhenbo Zhang 22.11.2005

Upload: marlene-morris

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

MPEG-4 & MHEG-5 (UK)

Aleksi LindbladMika Linnanoja

Marko LuukkainenZhenbo Zhang

22.11.2005

MPEG4 & MHEG5

Basics, objects, BIFS (Zhenbo)

XMT (Aleksi)

Delivery (Mika)

MHEG5 (Marko)

MPEG-4 Overview

Definition: A family of open international standards that provide tools for the delivery of multimedia

Tools- codecs for compressing conventional audio and video- form a framework for rich multimedia, i.e. combination of audio, video, graphics and interactive features

Excellent Conventional Codecs

Highest quality and compression efficiency Foundation of many new media products and services

Latest video codec: Advanced Video Codec (AVC1)- compression rate half of MPEG-2 for similar perceived quality- new standard for video transmission- new HDTV, satellite broadcasting, DSL video services, Sony PlayStation Portable, Apple QuickTime 7 Player will utilize AVC

Framework for Rich Interactive Media

Rich media tools: - combining audio and video with text, still images,

animations, and 2D & 3D vector graphics into interactive and personalized media experiences

MPEG-4 includes:- scripting language for simple interaction- MPEG-J for more elaborate programming

Why have manufactures and operators have chosen MPEG-4

Excellent Performance Open, Collaborative Development to Select the Best

Technologies Competitive but Compatible Implementations Lack of Strategic Control by a Supplier Public, Known Development Roadmap Encode Once, Play Anywhere Flexible Integration with Transport Networks Established Terms and Venues for Patent Licensing

Object Description

Object description: enumerates only the streams in a presentation and specifies how they relate to media objects

Scene description: assemble those media objects into a specific audiovisual scene

Object descriptor: a container aggregating all the useful information about the corresponding object

Information is structed in a hierarchical manner Through a set of sub descriptors

Synchronization of streams

Time: the most natural thing in the world A lot of thought has to be dedicated in the context of

multimedia streaming Time in MPEG-4 is always relative

Finding a simple temporal reference point Example: play back from a local file or unicast streaming

The presentation is processed from its start The start of the presentation makes a great reference point

In the case of broadcast or multicast playback The cliend may not be aware of the start of presentation The only known ponint: when the client tunes into the broadcast This point is different for each cliend and unknown to the sender The point when a portion of scene description data is received by the

terminal is taken as reference

Time stamps and access units

Two events in two different streams are supposed to happen at the same time?

How to know – time stamps Discrete portions of data related to a specific point in

time exist in all stream types These potions of data – Access Units Each ES is actually modeled as a sequence of Access Units Size and contents of AUs depend on the media coder used AUs are the data elements to which time stamps can be

attached

Time stamps

Two different types Decoding time: indicates the point in time at which all its

data has to be availabel in teh receiver and ideally be decoded at once

Composition time: indicates the time at which the decoded AU becomes available for composition and subsequent presentation

BIFS

Acronym for BInary Format for Scenes Provides a complete framework for the presentation

engine of MPEG-4 terminals Enables to mix various MPEG-4 media together with 2D

and 3D graphics, handle interactivity Be designed as an extention of the VRML 2.0(Virtual

Reality Modeling Language) specification in a binary form

Scene and Nodes

Scene is what the user of the MPEG-4 terminal sees and hears

Benificial to build the scence as a hierarchical structure or scene tree

Visible or audible objects are leaf nodes Multiple references to the same node are allowed =>the scene is not really a tree but a directed a cyclic

graph

Simplified scene tree

Fields and Routes

Fields - attributes and interface of the nodes

A value A type of the value A type of behavior A name

Routes Events are usually generated by sensor nodes Shall be connected to Event listener in oreder to modify the scene This connection is called a route

MPEG4 & MHEG5

Basics, objects, BIFS (Zhenbo)

XMT (Aleksi)

Delivery (Mika)

MHEG5 (Marko)

XMT

Overview

XMT-Ω

How it works?

XMT-Ω and SMIL

XMT-A

How it works?

XMT-A and X3D

What is XMT?

Extensible MPEG-4 Textual Format

XML-based coding language for MPEG-4 systems No explicit way to use the more elaborate

video- or audio-tools defined in MPEG-4

Designed for human- or computer-generated content creation and representation

What is XMT? (contd.)

Compatible with other XML-based multimedia languages SMIL X3D

Can also contain javascript (MPEG-J) Divided into two formats

High-level XMT-Ω Low-level XMT-A

XMT-Ω

Easy to use and clear high-level language for content creation

Divided into modules that realize certain functionalities For example animation and layout

Can also contain XMT-A nodes No one-on-one mapping to MPEG-4

systems or XMT-A

XMT-Ω and SMIL

XMT-Ω is based on SMIL Self-describing, extensible and familiar to

content producers However some of SMIL’s modules are

not appropriate for MPEG-4 systems For example layout

These are redesigned for or added to XMT-Ω ”in the spirit of” SMIL

XMT-Ω functionality

Timing, synchronization and time manipulation Time containers <par> and <seq> play

their contents parallel or in sequence Elements have time attributes such as

duration, beginning time and ending time Timing can also be tied to an event Time can be speeded up or slowed down

Events Basic input events (mouse click, mouse

over…) More elaborate events such as object

collisions

XMT-Ω functionality (contd.)

Animation <set> element simply changes the values

of the fields Different <animate> elements can be used

for sliding changes Spatial layout

<transform> element can be used to place elements

Layout module which works in a similar way as in SMIL can also be used

XMT-Ω code example

…<head>

<layout metrics="pixel" type="xmt/xmt-basic-layout"> <topLayout width="300" height="300"

backgroundColor="white"> <region id="video_region"> <region id="watermark_region" translation="100 -90" size="91

27"/> </region> </topLayout>

</layout> </head><body>

<par> <video src="rainier_hike.mp4#video" region="video_region"

begin="0s" dur="indefinite"/> <audio src="rainier_hike.mp4#audio" begin="0s"

dur="indefinite"/> <img src="emedia_icon91x27.jpg" id="sm_mark"

region="watermark_region" begin="0s" dur="indefinite" >…

XMT-A

More powerful low-level language A direct textual representation of MPEG-

4 systems and BIFS XMT-Ω code can be mapped to XMT-A in

several different ways

XMT-A and X3D

XMT-A is based on X3D X3D is an XML representation of VRML

on which MPEG-4 systems is based on Therefore XMT-A and X3D are highly

similar and interoperable with only small syntactic differences

Object descriptor framework is unique to MPEG-4 and XMT-A

Some XMT-A elements

Routes Bind the values of two fields together

BIFS-Commands Insert, Delete, Replace Can be used on fields, nodes or routes

Object descriptors Describe Elementary Streams that contain

media such as video or audio

XMT-A code example…<Transform2D DEF="Transformation"><children><TouchSensor DEF=“Button"/><Shape><geometry><Rectangle size="50 40"/></geometry></Shape></children></Transform2D><Conditional DEF=“ButtonPressed"><buffer><Replace atNode="Mover" atField="key" position="1" value="0.2"/></buffer></Conditional><PositionInterpolator2D DEF="Mover" key="0 0.5 1" keyValue="-100 0 100 0 -100 0"/><TimeSensor DEF="AnimationTimer" cycleInterval="2" loop="TRUE"/>…<ROUTE fromNode=“Button" fromField="isActive" toNode=“ButtonPressed" toField="activate"/><ROUTE fromNode="AnimationTimer" fromField="fraction_changed" toNode="Mover"

toField="set_fraction"/><ROUTE fromNode="Mover" fromField="value_changed" toNode="Transformation"

toField="translation"/>…

XML XMT

Overview of XMT

BIFS

MPEG-4 systems

XMT-A

XMT-ΩSMIL

X3D

VRML

Node Types

Shape nodes Geometry field – contains a geometry node,e.g.

Rectangle, Circle, Box, Bitmap Appearance field – contains an Appearance node

Interpolator nodes Conditional nodes

Further expands the possibilities of interaction Script nodes, PROTO nodes, etc

Scene Changes

BIFS – Commonds Sigle changes to the scence Packaged in AUs of the scene description ES BIFS-Commands are single changes to the scene, e.g. of

color or position e.g. insert, delete, replace

BIFS – Anim streams separate streams containing structured changes to a scene Framework, three elements

Animation Mask Animation Frames AnimationStream

MPEG4 & MHEG5

Basics, objects, BIFS (Zhenbo)

XMT (Aleksi)

Delivery (Mika)

MHEG5 (Marko)

MPEG-4 Delivery & misc

Topics MPEG-4 content delivery MP4 file format Interoperability: profiles & levels Video coding (if time allows)

MPEG-4 Content Delivery

Delivery - Storing and Transporting of MPEG-4 compositions

MPEG-4 content must be delivered to many and very different audiences Interworking with current delivery mechanisms

Internet (MPEG-4 over IP) Broadcasting (MPEG-4 over MPEG-2 Transport &

Program Stream) Abstraction of content delivery in MPEG-4 part

Delivery Multimedia Integration Framework MPEG-4 File Format based on Apple’s Quicktime

design

Delivery Multimedia Integration Framework, DMIF

OSI session layer service providing a mechanism for hiding technology details from upper layer applications

DMIF concepts Users (applications) Sessions (presentation level) Channels (stream level)

DMIF instance – implementation of delivery layer Basically different MPEG-4 Elementary Streams

(ES) are multiplexed with timing information to the delivery network

Stack ideology with multiple layers

Illustration

Elementary streams

Synchronization Layer

DMIF Application Interface

SLS

LS

L

FlexMux tool

Delivery Layer

UDP

MPEG-2 TS

ATM etc

MPEG-4 delivery structure (User Plane)

FlexMux streams

TransMux streams

FlexMux channel

TransMux Channel

SL-packetized streams

DMIF functionality

In principle works like FTP Application opens session Decides which ES need to be transported (or

saved) Creates channels for the streams Channels carry also instructions for

interactivity (play, pause, stop) Quality of Service parameters can be

assigned to the delivery channels and monitored, although advanced QoS handling is not included in the standard

DMIF Application Interface

Defines functions offered by DMIF DAI Primitives, only semantics defined

Service (create, destroy) Channel (create, destroy) QoS monitoring (setup, control) User commands (user interaction) Data (actual media content)

DMIF user calls these ”functions” to establish a connection and convey media and interaction

DMIF Network Interface

Used for determining and sharing the needed information between DAI peers over a transmission channel

Multiplexing of many DAI sessions to single TransMux (ATM/UDP/MPEG-2) channel

Does not define “bits on the wire” itself Concepts from other peer-to-peer protocols Similar primitives as in DAI

Session Service TransMux Channel User commands

DMIF implementations

Mappings to real existing transport protocols ATM Q.2931 – no changes needed to atm protocol ITU-T H.245 – additions in H.245 v.6 Real Time Signalling Protocol (RTSP), does not support all MPEG-4 functionality

MPEG-4 over MPEG-2 (broadcasting and authoring) Offering better quality via established transport means (MPEG-2 TS used in DVB

and PS used in DVD), ”alternative codec” thinking Special amendment in MPEG-2 Systems standard Transfer either scene-based or stream-based

MPEG-4 over IP Uses Realtime Transport Protocol (RTP), which already encompasses timing

information MPEG-4 as payload in RTP, specified in RFC3016 Special care with packet alignment, so that dropped (single) RTP packets do not

cause problems Mainly work-in-progress in 2001/2002 Commercial solutions available now

MPEG-4 File Format, mp4

Based on Apple Computer’s Quicktime Not just stream ready to be delivered as with MPEG-1 and

MPEG-2 Editing and reuse possible without quality reductions (lossy

decoding-recoding process eliminated) Life-cycle file format, used in capturing, editing and

combining File includes stream data (video/audio) separately of

metadata describing it Relative timing, frame sizes et cetera in structural tables Nonframing format

Hints to help fragmenting the frames for streaming Possibly many tracks of video and audio

Sample descriptors in tracks to identify required decoder Handy tool to compose mp4 files: GPAC/mp4box

MPEG-4 Profiles 1/2

Ideas Ensuring interoperability – allow manufacturers to

only use subset of available tools Conformance to the standard testable

Profiles available for video, audio, graphics, scene description, mpeg-java, object descriptor

Levels defined within each profile for further discrete parameter limitations (bitrates etc)

Restrictions Encoder: bitstream complexity not exceeded at

defined profile@level Decoder: able to handle most complex bitstream at

certain profile@level

MPEG-4 Profiles 2/2

Object based approach How many objects must be decoded simultaneously at a

given time greatly affects decoder’s required performance

Audio / Video profiles List of allowed techniques and object types

Advanced Simple (video) profile: I-VOP, P-VOP, B-VOP, GMC, QPEL, up to 8 Mbit/s @ level 5

Graphics profiles Allowed BIFS nodes (’tags’ in XMT realization)

Simple 2D profile: Appearance, Bitmap, Shape Development

New technologies introduced in new profiles, old ones unchanged interoperability

Only new profiles/levels if they provide major changes

MPEG-4 Video Coding

Main goal to provide superb quality and innovative video compression techniques that produce content requiring less storage space

Old coding and compression techniques such as MPEG-2 only use rectangular frame models

Handled in MPEG-4 Visual standard Arbitrarily shaped objects Wide range of bitrates (handhelds vs studio) Spatial, temporal and quality scalability Error-prone transmission abilities Only decoder and bitstreams specified, encoders left to

industry Profiles and Levels defined to limit implementation difficulties,

”use what you need” mentality Both video and still images (textures)

Video shapes

MPEG-4 video scenes compose of Visual Objects (VO), which are sequences of Video Object Planes (VOP), can be thought of as frames

For each VOP an alpha plane is also defined, making possible to have transparent parts of the video and therefore arbitrary shapes to be coded

Each object has a bounding box that includes the object

Bounding boxes consist of macroblocks (16x16 pixels) Macroblocks can be either transparent, opaque or

border type Opaque blocks coded with hybrid DCT/motion

compensation techniques like in MPEG-2

Rectangular video coding

Hybrid, block based compression schema Basic principles

Motion Compensation, only changes are saved to reduce storage or transmission capacity

Discrete Cosine Transformation (DCT) to remove content that is indistinguishable by humans

New inventions Quartel-pixel motion compensation, motion vector resolution increased

to decrease prediction errors Global motion compensation, motion data for a complete VOP (frame)

instead of macroblocks only, also viewed as ”dynamic sprite coding” Direct mode bidirectional prediction, motion vectors of neighbor blocks

used Innovations realized in the new Advanced Video Coding (AVC1)

codec, also known as H.264 (ITU-T term) Open-sourced alternative encoder available at

http://developers.videolan.org/x264.html

MPEG-4 Video Coding Tools

Special tools intended for certain specific uses of video Interlaced coding

For TV broadcasting needs, also HDTV formats like 1080i Frame/field DCT, transforms on fields rather than frames for better

quality Field motion compensation using 16x8 top and bottom fields

Error-resilient coding Goal is to reduce overhead in the introduction of redundant data Packet-based periodic resynchronization, Data partitioning, NEWPRED

Reduced resolution coding Sprite coding

Unchangeable parts in video content coded separately as static sprites (textures)

Texture coding for studio applications Higher precision and lossless ability Uncompressed PCM coding

MPEG4 & MHEG5

Basics, objects, BIFS (Zhenbo)

XMT (Aleksi)

Delivery (Mika)

MHEG5 (Marko)

4th PartDigital Terrestrial TelevisionMHEG-5 Specification

Multimedia and Hypermedia information coding Experts Group

MHEG-5 DTT UK Object model of multimedia presentation Audio, video, text and graphics Broadcasting applications and their data

into TV networks. Optional return channel

MHEG-5 Engine profile

Based on ISO/IEC 13522-5 Some features modified, some added, some

optional/removed Defines set of classes that profile must

implement Examples : Variable, Slider, Video

Features : Caching, Cloning, Video scaling and Stacking

of Applications

The User Experience

Visual Appearances Conventional TV TV with Visual prompt of available

information TV with information overlaid Information with video or picture inset Just information

Visual appearances

Remote controller

MHEG-5 Graphics Model

720 x 576 pixels with 256 colors 632 x 518 safe area due to overscan 64 colors defined by DVB subtitle stream 4 colors defined by receiver manufacturer 188 color defined by MHEG-5 application

Three levels of transparency required 0% (opaque), 30% and 100% (fully

transparent) Bitmaps

Full PNG 1.0 support MPEG I Frames

Text and Interactibles

Character encoding standards : ISO 10646-1 and UTF-8

Supported set of characters is defined Triserias (DTG/RNIB) font must be

supported Current profile doesn't support font

downloading

Interactibles

EntryField Input of text and numbers

HyperText Links in text

Slider Adjusting value

Application life-cycle

Only one application running at time Application may launch other application

Original application is destroyed in the process

Auto-boot application Launched when service is selected or when

other applications have quit Applications are loaded from DSM-CC

Object carousel

MHEG-5 System Overview

Carousel generation& transmission

MHEG Engine

TV

Remote

Information server

Broadcast file system

Optional return channel

MHEG-5 Summary

Offers lower cost interactive TV than MHP Low hardware requirements

Coexistence and migration to MHP possible

Applications

Digital Teletext Program guides Interactive advertising Educational Games

References

F. Pereira and T. Ebrahimi. The MPEG-4 Book. Prentice Hall, Upper Saddle River (NJ), 2002.

Digital TV Group (DTG). Digital terrestrial television MHEG-5 specification. v1.06, May 2003.

D. Cuttis, Strategy & Technology Ltd. Solutions for Interactive Digital Broadcasting using MHEG-5. V1.0, September 2003.