mpeg-4 & mheg-5 (uk) aleksi lindblad mika linnanoja marko luukkainen zhenbo zhang 22.11.2005
TRANSCRIPT
MPEG-4 Overview
Definition: A family of open international standards that provide tools for the delivery of multimedia
Tools- codecs for compressing conventional audio and video- form a framework for rich multimedia, i.e. combination of audio, video, graphics and interactive features
Excellent Conventional Codecs
Highest quality and compression efficiency Foundation of many new media products and services
Latest video codec: Advanced Video Codec (AVC1)- compression rate half of MPEG-2 for similar perceived quality- new standard for video transmission- new HDTV, satellite broadcasting, DSL video services, Sony PlayStation Portable, Apple QuickTime 7 Player will utilize AVC
Framework for Rich Interactive Media
Rich media tools: - combining audio and video with text, still images,
animations, and 2D & 3D vector graphics into interactive and personalized media experiences
MPEG-4 includes:- scripting language for simple interaction- MPEG-J for more elaborate programming
Why have manufactures and operators have chosen MPEG-4
Excellent Performance Open, Collaborative Development to Select the Best
Technologies Competitive but Compatible Implementations Lack of Strategic Control by a Supplier Public, Known Development Roadmap Encode Once, Play Anywhere Flexible Integration with Transport Networks Established Terms and Venues for Patent Licensing
Object Description
Object description: enumerates only the streams in a presentation and specifies how they relate to media objects
Scene description: assemble those media objects into a specific audiovisual scene
Object descriptor: a container aggregating all the useful information about the corresponding object
Information is structed in a hierarchical manner Through a set of sub descriptors
Synchronization of streams
Time: the most natural thing in the world A lot of thought has to be dedicated in the context of
multimedia streaming Time in MPEG-4 is always relative
Finding a simple temporal reference point Example: play back from a local file or unicast streaming
The presentation is processed from its start The start of the presentation makes a great reference point
In the case of broadcast or multicast playback The cliend may not be aware of the start of presentation The only known ponint: when the client tunes into the broadcast This point is different for each cliend and unknown to the sender The point when a portion of scene description data is received by the
terminal is taken as reference
Time stamps and access units
Two events in two different streams are supposed to happen at the same time?
How to know – time stamps Discrete portions of data related to a specific point in
time exist in all stream types These potions of data – Access Units Each ES is actually modeled as a sequence of Access Units Size and contents of AUs depend on the media coder used AUs are the data elements to which time stamps can be
attached
Time stamps
Two different types Decoding time: indicates the point in time at which all its
data has to be availabel in teh receiver and ideally be decoded at once
Composition time: indicates the time at which the decoded AU becomes available for composition and subsequent presentation
BIFS
Acronym for BInary Format for Scenes Provides a complete framework for the presentation
engine of MPEG-4 terminals Enables to mix various MPEG-4 media together with 2D
and 3D graphics, handle interactivity Be designed as an extention of the VRML 2.0(Virtual
Reality Modeling Language) specification in a binary form
Scene and Nodes
Scene is what the user of the MPEG-4 terminal sees and hears
Benificial to build the scence as a hierarchical structure or scene tree
Visible or audible objects are leaf nodes Multiple references to the same node are allowed =>the scene is not really a tree but a directed a cyclic
graph
Fields and Routes
Fields - attributes and interface of the nodes
A value A type of the value A type of behavior A name
Routes Events are usually generated by sensor nodes Shall be connected to Event listener in oreder to modify the scene This connection is called a route
What is XMT?
Extensible MPEG-4 Textual Format
XML-based coding language for MPEG-4 systems No explicit way to use the more elaborate
video- or audio-tools defined in MPEG-4
Designed for human- or computer-generated content creation and representation
What is XMT? (contd.)
Compatible with other XML-based multimedia languages SMIL X3D
Can also contain javascript (MPEG-J) Divided into two formats
High-level XMT-Ω Low-level XMT-A
XMT-Ω
Easy to use and clear high-level language for content creation
Divided into modules that realize certain functionalities For example animation and layout
Can also contain XMT-A nodes No one-on-one mapping to MPEG-4
systems or XMT-A
XMT-Ω and SMIL
XMT-Ω is based on SMIL Self-describing, extensible and familiar to
content producers However some of SMIL’s modules are
not appropriate for MPEG-4 systems For example layout
These are redesigned for or added to XMT-Ω ”in the spirit of” SMIL
XMT-Ω functionality
Timing, synchronization and time manipulation Time containers <par> and <seq> play
their contents parallel or in sequence Elements have time attributes such as
duration, beginning time and ending time Timing can also be tied to an event Time can be speeded up or slowed down
Events Basic input events (mouse click, mouse
over…) More elaborate events such as object
collisions
XMT-Ω functionality (contd.)
Animation <set> element simply changes the values
of the fields Different <animate> elements can be used
for sliding changes Spatial layout
<transform> element can be used to place elements
Layout module which works in a similar way as in SMIL can also be used
XMT-Ω code example
…<head>
<layout metrics="pixel" type="xmt/xmt-basic-layout"> <topLayout width="300" height="300"
backgroundColor="white"> <region id="video_region"> <region id="watermark_region" translation="100 -90" size="91
27"/> </region> </topLayout>
</layout> </head><body>
<par> <video src="rainier_hike.mp4#video" region="video_region"
begin="0s" dur="indefinite"/> <audio src="rainier_hike.mp4#audio" begin="0s"
dur="indefinite"/> <img src="emedia_icon91x27.jpg" id="sm_mark"
region="watermark_region" begin="0s" dur="indefinite" >…
XMT-A
More powerful low-level language A direct textual representation of MPEG-
4 systems and BIFS XMT-Ω code can be mapped to XMT-A in
several different ways
XMT-A and X3D
XMT-A is based on X3D X3D is an XML representation of VRML
on which MPEG-4 systems is based on Therefore XMT-A and X3D are highly
similar and interoperable with only small syntactic differences
Object descriptor framework is unique to MPEG-4 and XMT-A
Some XMT-A elements
Routes Bind the values of two fields together
BIFS-Commands Insert, Delete, Replace Can be used on fields, nodes or routes
Object descriptors Describe Elementary Streams that contain
media such as video or audio
XMT-A code example…<Transform2D DEF="Transformation"><children><TouchSensor DEF=“Button"/><Shape><geometry><Rectangle size="50 40"/></geometry></Shape></children></Transform2D><Conditional DEF=“ButtonPressed"><buffer><Replace atNode="Mover" atField="key" position="1" value="0.2"/></buffer></Conditional><PositionInterpolator2D DEF="Mover" key="0 0.5 1" keyValue="-100 0 100 0 -100 0"/><TimeSensor DEF="AnimationTimer" cycleInterval="2" loop="TRUE"/>…<ROUTE fromNode=“Button" fromField="isActive" toNode=“ButtonPressed" toField="activate"/><ROUTE fromNode="AnimationTimer" fromField="fraction_changed" toNode="Mover"
toField="set_fraction"/><ROUTE fromNode="Mover" fromField="value_changed" toNode="Transformation"
toField="translation"/>…
Node Types
Shape nodes Geometry field – contains a geometry node,e.g.
Rectangle, Circle, Box, Bitmap Appearance field – contains an Appearance node
Interpolator nodes Conditional nodes
Further expands the possibilities of interaction Script nodes, PROTO nodes, etc
Scene Changes
BIFS – Commonds Sigle changes to the scence Packaged in AUs of the scene description ES BIFS-Commands are single changes to the scene, e.g. of
color or position e.g. insert, delete, replace
BIFS – Anim streams separate streams containing structured changes to a scene Framework, three elements
Animation Mask Animation Frames AnimationStream
MPEG-4 Delivery & misc
Topics MPEG-4 content delivery MP4 file format Interoperability: profiles & levels Video coding (if time allows)
MPEG-4 Content Delivery
Delivery - Storing and Transporting of MPEG-4 compositions
MPEG-4 content must be delivered to many and very different audiences Interworking with current delivery mechanisms
Internet (MPEG-4 over IP) Broadcasting (MPEG-4 over MPEG-2 Transport &
Program Stream) Abstraction of content delivery in MPEG-4 part
Delivery Multimedia Integration Framework MPEG-4 File Format based on Apple’s Quicktime
design
Delivery Multimedia Integration Framework, DMIF
OSI session layer service providing a mechanism for hiding technology details from upper layer applications
DMIF concepts Users (applications) Sessions (presentation level) Channels (stream level)
DMIF instance – implementation of delivery layer Basically different MPEG-4 Elementary Streams
(ES) are multiplexed with timing information to the delivery network
Stack ideology with multiple layers
Illustration
Elementary streams
Synchronization Layer
DMIF Application Interface
SLS
LS
L
FlexMux tool
Delivery Layer
UDP
MPEG-2 TS
ATM etc
MPEG-4 delivery structure (User Plane)
FlexMux streams
TransMux streams
FlexMux channel
TransMux Channel
SL-packetized streams
DMIF functionality
In principle works like FTP Application opens session Decides which ES need to be transported (or
saved) Creates channels for the streams Channels carry also instructions for
interactivity (play, pause, stop) Quality of Service parameters can be
assigned to the delivery channels and monitored, although advanced QoS handling is not included in the standard
DMIF Application Interface
Defines functions offered by DMIF DAI Primitives, only semantics defined
Service (create, destroy) Channel (create, destroy) QoS monitoring (setup, control) User commands (user interaction) Data (actual media content)
DMIF user calls these ”functions” to establish a connection and convey media and interaction
DMIF Network Interface
Used for determining and sharing the needed information between DAI peers over a transmission channel
Multiplexing of many DAI sessions to single TransMux (ATM/UDP/MPEG-2) channel
Does not define “bits on the wire” itself Concepts from other peer-to-peer protocols Similar primitives as in DAI
Session Service TransMux Channel User commands
DMIF implementations
Mappings to real existing transport protocols ATM Q.2931 – no changes needed to atm protocol ITU-T H.245 – additions in H.245 v.6 Real Time Signalling Protocol (RTSP), does not support all MPEG-4 functionality
MPEG-4 over MPEG-2 (broadcasting and authoring) Offering better quality via established transport means (MPEG-2 TS used in DVB
and PS used in DVD), ”alternative codec” thinking Special amendment in MPEG-2 Systems standard Transfer either scene-based or stream-based
MPEG-4 over IP Uses Realtime Transport Protocol (RTP), which already encompasses timing
information MPEG-4 as payload in RTP, specified in RFC3016 Special care with packet alignment, so that dropped (single) RTP packets do not
cause problems Mainly work-in-progress in 2001/2002 Commercial solutions available now
MPEG-4 File Format, mp4
Based on Apple Computer’s Quicktime Not just stream ready to be delivered as with MPEG-1 and
MPEG-2 Editing and reuse possible without quality reductions (lossy
decoding-recoding process eliminated) Life-cycle file format, used in capturing, editing and
combining File includes stream data (video/audio) separately of
metadata describing it Relative timing, frame sizes et cetera in structural tables Nonframing format
Hints to help fragmenting the frames for streaming Possibly many tracks of video and audio
Sample descriptors in tracks to identify required decoder Handy tool to compose mp4 files: GPAC/mp4box
MPEG-4 Profiles 1/2
Ideas Ensuring interoperability – allow manufacturers to
only use subset of available tools Conformance to the standard testable
Profiles available for video, audio, graphics, scene description, mpeg-java, object descriptor
Levels defined within each profile for further discrete parameter limitations (bitrates etc)
Restrictions Encoder: bitstream complexity not exceeded at
defined profile@level Decoder: able to handle most complex bitstream at
certain profile@level
MPEG-4 Profiles 2/2
Object based approach How many objects must be decoded simultaneously at a
given time greatly affects decoder’s required performance
Audio / Video profiles List of allowed techniques and object types
Advanced Simple (video) profile: I-VOP, P-VOP, B-VOP, GMC, QPEL, up to 8 Mbit/s @ level 5
Graphics profiles Allowed BIFS nodes (’tags’ in XMT realization)
Simple 2D profile: Appearance, Bitmap, Shape Development
New technologies introduced in new profiles, old ones unchanged interoperability
Only new profiles/levels if they provide major changes
MPEG-4 Video Coding
Main goal to provide superb quality and innovative video compression techniques that produce content requiring less storage space
Old coding and compression techniques such as MPEG-2 only use rectangular frame models
Handled in MPEG-4 Visual standard Arbitrarily shaped objects Wide range of bitrates (handhelds vs studio) Spatial, temporal and quality scalability Error-prone transmission abilities Only decoder and bitstreams specified, encoders left to
industry Profiles and Levels defined to limit implementation difficulties,
”use what you need” mentality Both video and still images (textures)
Video shapes
MPEG-4 video scenes compose of Visual Objects (VO), which are sequences of Video Object Planes (VOP), can be thought of as frames
For each VOP an alpha plane is also defined, making possible to have transparent parts of the video and therefore arbitrary shapes to be coded
Each object has a bounding box that includes the object
Bounding boxes consist of macroblocks (16x16 pixels) Macroblocks can be either transparent, opaque or
border type Opaque blocks coded with hybrid DCT/motion
compensation techniques like in MPEG-2
Rectangular video coding
Hybrid, block based compression schema Basic principles
Motion Compensation, only changes are saved to reduce storage or transmission capacity
Discrete Cosine Transformation (DCT) to remove content that is indistinguishable by humans
New inventions Quartel-pixel motion compensation, motion vector resolution increased
to decrease prediction errors Global motion compensation, motion data for a complete VOP (frame)
instead of macroblocks only, also viewed as ”dynamic sprite coding” Direct mode bidirectional prediction, motion vectors of neighbor blocks
used Innovations realized in the new Advanced Video Coding (AVC1)
codec, also known as H.264 (ITU-T term) Open-sourced alternative encoder available at
http://developers.videolan.org/x264.html
MPEG-4 Video Coding Tools
Special tools intended for certain specific uses of video Interlaced coding
For TV broadcasting needs, also HDTV formats like 1080i Frame/field DCT, transforms on fields rather than frames for better
quality Field motion compensation using 16x8 top and bottom fields
Error-resilient coding Goal is to reduce overhead in the introduction of redundant data Packet-based periodic resynchronization, Data partitioning, NEWPRED
Reduced resolution coding Sprite coding
Unchangeable parts in video content coded separately as static sprites (textures)
Texture coding for studio applications Higher precision and lossless ability Uncompressed PCM coding
4th PartDigital Terrestrial TelevisionMHEG-5 Specification
Multimedia and Hypermedia information coding Experts Group
MHEG-5 DTT UK Object model of multimedia presentation Audio, video, text and graphics Broadcasting applications and their data
into TV networks. Optional return channel
MHEG-5 Engine profile
Based on ISO/IEC 13522-5 Some features modified, some added, some
optional/removed Defines set of classes that profile must
implement Examples : Variable, Slider, Video
Features : Caching, Cloning, Video scaling and Stacking
of Applications
The User Experience
Visual Appearances Conventional TV TV with Visual prompt of available
information TV with information overlaid Information with video or picture inset Just information
MHEG-5 Graphics Model
720 x 576 pixels with 256 colors 632 x 518 safe area due to overscan 64 colors defined by DVB subtitle stream 4 colors defined by receiver manufacturer 188 color defined by MHEG-5 application
Three levels of transparency required 0% (opaque), 30% and 100% (fully
transparent) Bitmaps
Full PNG 1.0 support MPEG I Frames
Text and Interactibles
Character encoding standards : ISO 10646-1 and UTF-8
Supported set of characters is defined Triserias (DTG/RNIB) font must be
supported Current profile doesn't support font
downloading
Application life-cycle
Only one application running at time Application may launch other application
Original application is destroyed in the process
Auto-boot application Launched when service is selected or when
other applications have quit Applications are loaded from DSM-CC
Object carousel
MHEG-5 System Overview
Carousel generation& transmission
MHEG Engine
TV
Remote
Information server
Broadcast file system
Optional return channel
MHEG-5 Summary
Offers lower cost interactive TV than MHP Low hardware requirements
Coexistence and migration to MHP possible
References
F. Pereira and T. Ebrahimi. The MPEG-4 Book. Prentice Hall, Upper Saddle River (NJ), 2002.
Digital TV Group (DTG). Digital terrestrial television MHEG-5 specification. v1.06, May 2003.
D. Cuttis, Strategy & Technology Ltd. Solutions for Interactive Digital Broadcasting using MHEG-5. V1.0, September 2003.