interoperable adaptive multimedia communication

The desire to gain access to rich, multimedia-based information anywhere, anytime grows

enormously. Web sites without multimedia-enriched information—including text, images,audio, and video samples—might no longer beappropriate to satisfy today’s users. The appetitefor accessing multimedia-based content via almostany type of terminals—seamlessly over dynamicand heterogeneous networks, independently oflocation or time—is similarly growing. To achievesuch access, the research and standardizationcommunities have launched an initiative calledUniversal Multimedia Access (UMA).1,2 However,UMA tools and specifications that have so faremerged concentrate mostly on constraintsimposed by terminals and networks along themultimedia delivery chain; users who consume

the content are rarely considered. Pereira and Bur-nett3 have presented some initial thoughts onhow to incorporate user preferences and demandsin today’s multimedia infrastructures.

In practice, the main issues are as follows. Onthe one hand, end users want rich multimediacontent available to them anytime, anywhere onnearly any kind of device. Because each deviceaccessing the content has different capabilities—for example, display resolution or color depth—the content must be adapted according to thesecapabilities. On the other hand, content providersaspire to offer multimedia-based information ofthe best possible quality with respect to the con-sumer’s context, without neglecting economicprinciples. The many problems providers faceinclude supporting the various coding formats(without wasting disk space) and providing appro-priate adaptation modules.

In recent years, researchers have developed aplethora of technologies and standards toaddress some of these issues. However, the bigpicture of how these different technologies andstandards fit together is missing. Thus, the Mov-ing Picture Experts Group (MPEG) decided tostandardize the MPEG-21 Multimedia Frame-work4 with the ultimate goal to support usersduring the exchange, access, consumption,trade, or other manipulation of so-called DigitalItems in an efficient, transparent, and interop-erable way.

Device and coding-format-independentmultimedia content adaptation

Standardization committees such as MPEG,the Internet Engineering Task Force (IETF), andthe World Wide Web Consortium (W3C) areexploring device and coding format indepen-dence issues. Here, however, we focus on how wecan use the tools specified within MPEG-21 forinteroperable multimedia communication.

74 1070-986X/05/$20.00 © 2005 IEEE Published by the IEEE Computer Society

Standards Editor: John R. SmithIBM

ChristianTimmerer and

HermannHellwagnerKlagenfurtUniversity

Interoperable Adaptive MultimediaCommunication

Editor’s Note:The drive for innovation in multimedia technology often results in a diver-

sity of ideas and approaches. However, this diversity is introducing new chal-lenges for accessing multimedia content anytime, anywhere, any way.

The richness in multimedia content and increasing heterogeneity of net-works and user devices is making interoperable multimedia communica-tions difficult. Intelligent solutions are needed to enable multimedia contentaccess under a wide range of delivery conditions and usage environments.The MPEG-21 standard promises to fill this need by standardizing descrip-tors for multimedia content access and allowing standards-compatible tech-nologies to be used for adapting multimedia content.

In this article, Christian Timmerer and Hermann Hellwagner tell us howMPEG-21 seeks to achieve interoperable multimedia communication acrossnetworks and devices. They describe how MPEG-21 addresses device andformat coding independence by standardizing descriptors of usage envi-ronment and bit-stream syntax. Through detailed examples involvingstreaming of audio–video resources and adapting of images according toterminal capabilities, the article illustrates how MPEG-21 solves the chal-lenging and important problem of universal multimedia access.

—John R. Smith

Device independenceDevice indepen-

dence, usually referredto as the ability to play(multimedia) contentindependently of thepresentation device,requires an interopera-ble description formatof the device’s capabil-ities. The usage envi-ronment description(UED), part of theMPEG-21 Digital ItemAdaptation (DIA) spec-ification,5-7 specifieshow to describe suchdevices in terms oftheir codec capabili-ties, I/O capabilities,and device properties;furthermore, the UEDdefines description for-mats for the networksthrough which thecontent is accessed.The UED also providesa means for describing user characteristics andpreferences—such as the user’s current geo-graphical location—and for describing the nat-ural environment, such as illumination of theuser’s room. By facilitating descriptions of theuser’s environment, where the multimedia con-tent is likely to be consumed, the UED con-tributes to maximizing the user’s overallexperience.

Coding-format independenceSo far, we’ve described only one aspect of

interoperable multimedia communication: theconsumer side of a media delivery and adapta-tion chain. To provide multimedia-based adap-tation services, we must also consider theprovider side’s needs. The MPEG-21 Digital ItemDeclaration (DID)8 provides a generic containerformat for associating metadata with multimediacontent. DIDs containing references to the actu-al multimedia resources, and their associatedmetadata, supply the provider-side input to anadaptation engine. Here, we’ll focus on the meta-data in the DIDs that enables coding-format-independent multimedia adaptation.

To deal with the diversity of existing scalablecoding formats—MPEG-4 or JPEG2000, say—it’s

desirable to have a generic approach for adaptingmedia bitstreams. Equally important, such ageneric approach lets us address emerging formats,such as MPEG-21 Scalable Video Coding. WithinDIA, bitstream syntax description (BSD)9 lets usdescribe the bitstream’s high-level structure—forexample, stream organization by frames, layers, orpackets—using the Extensible Markup Language(XML). In particular, the generic BSD (gBSD), asdescribed elsewhere,10 allows us to perform cod-ing-format-independent bitstream adaptations.According to this concept, we modify the gBSD ofa media bitstream, for example, by means of anExtensible Stylesheet Language Transformation(XSLT) style sheet. Such modifications let us dis-card gBSD portions corresponding to specificframe types, or update certain layer information.The transformed gBSD is then input to a universaladaptation module that generates an adapted bit-stream based on the gBSD information. With thisapproach, we can use a single generic adaptationmodule rather than a specific module for eachexisting or future coding format. Figure 1 showsthe multimedia publishing framework using theMPEG-21 Digital Item Adaptation’s gBSD andXSLT to adapt multimedia resources according todifferent user environments.

75

January–M

arch 2005

XSLT processor

gBSDtoBinprocessor

describes

Bitstream

Adaptedbitstream

Adaptedbitstream

gBSD

TransformedgBSD

XSLTstylesheet Context information

(usage environment description)

Differentend devices

gBSDgBSDtoBinXSLT

generic Bitstream Syntax DescriptionDIA Digital Item Adaptation

generic Bitstream Syntax Description-to-BinaryExtensible Stylesheet Language Transformation

Figure 1. Multimedia

publishing using

MPEG-21 DIA

according to different

usage contexts.

The gBSD describing the high-level structureof the bitstream in Figure 1 is transformedaccording to the context information that differ-ent end devices provide. In particular, this con-text information sets the parameters of the XSLTstyle sheet responsible for transforming thegBSD. The transformed gBSD and the source bit-stream are subject to the generic-bitstream-syn-tax-description-to-binary (gBSDtoBin) processorthat performs the actual bitstream adaptation.This adaptation module parses and interprets thetransformed gBSD, which describes the adaptedbitstream by referencing the corresponding por-tions from the source bitstream. The gBSDtoBinprocessor simply copies these portions from thesource bitstream to the target (that is, adapted)bitstream transmitted to the end device or user.Parameters that have been possibly updated dur-ing the transformation process are encoded inthe adapted bitstream using the type informationfrom the transformed gBSD. As such, the gBSD-based adaptation framework enables coding-for-mat-independent adaptations.

However, the gBSD’s transformation requiresthe optimum adaptation parameters with respectto the UED, taking into account quality of service(QoS) information. Two tools within DIA let usmeet the above requirement, namely the Adap-tationQoS (AQoS) tool and universal constraintsdescription (UCD) tool, as Mukherjee et al.explain.11 AQoS specifies the relationshipbetween, for example, device constraints, feasi-ble adaptation operations satisfying these con-straints, and associated utilities (or qualities) of

the multimedia content. On the other hand, theUCD lets us specify further constraints on theuser environment and the use of a Digital Itemby means of limitation and optimization con-straints. An example is the information that thewindow size of the application is smaller thanthe actual display resolution. For example, theUED might describe a 1,400 × 1,050-pixel resolu-tion, and the UCD constrains this further byinforming the adaptation engine that only 70percent of this is available.

Therefore, metadata like gBSD, AQoS, andUCD associated with media resources are pack-aged into DIDs, which forms the provider-sideinput to an adaptation module that enablesinteroperable multimedia communication.

Adaptation architectureFigure 2 shows our proposed high-level archi-

tecture for device and coding-format-indepen-dent multimedia content adaptation. The DID,input to an adaptation engine from the providerside, contains the AQoS, the gBSD, and the mediaresource. The UED, which might be further con-strained by UCDs, is the input from the con-sumer side. The description adaptation engineperforms adaptation decision-making by pro-cessing the AQoS description and the UED andUCD. This engine then makes an optimal deci-sion for the subsequent resource adaptation.

Within the resource adaptation engine, XSLTstyle sheets—using the parameters obtained fromthe decision-making process—transform thegBSD. Thereafter, the adaptation engine uses thetransformed gBSD to generate the adapted mediaresource via the generic adaptation module, asspecified within the MPEG-21 DIA. Finally, theadaptation engine transmits the adapted resourceto the user where it’s consumed during a multi-media experience. To allow for further adapta-tion steps, we package the updated gBSD andAQoS descriptions into a DID together with theadapted resource.

Because of the open interfaces, we can use thiskind of adaptation engine anywhere within themultimedia delivery chain. Traditionally, it’s onthe server—or on an adaptation service—withinintermediary network nodes.

Usage scenariosNext, we present two usage scenarios that may

benefit from such adaptation services:

❚ streaming of audio–video (AV) resources and

76

IEEE

Mul

tiM

edia

Standards

Descriptionadaptation engine

Adaptation engine

UED/UCD

gBSD

AQoS

Resource

describes

DIA

DID

Provider side Consumer side

gBSD′

AQoS′

Resource

describes

DID′

Resourceadaptation engine

AQoSDIADID

gBSDUED/UCD

Adaptation quality of service Digital Item AdaptationDigital Item Declarationgeneric Bitstream Syntax DescriptionUsage environment description/universalconstraints description

Figure 2. High-level

architecture for device

and coding-format-

independent

multimedia content

adaptation using

MPEG technologies.

❚ adapting images according to terminalcapabilities.

Streaming of AV resourcesIn today’s ever-more-popular streaming appli-

cations, AV resources are usually preconfiguredfor various types of network conditions. In somecases, they’re adapted within the rendering deviceaccording to its capabilities, such as those con-cerning color depth or spatial resolution. Howev-er, to follow this approach raises two issues:

❚ Storage capacity is wasted because of themaintenance of multiple copies of the samecontent for different classes of networks. Addi-tionally, users are forced to configure theirconnectivity information to obtain an opti-mal user experience.

❚ Network bandwidth may still be wasted incases where a high-quality bitstream isdelivered to a device with limited colorcapabilities or spatial resolution. Also, centralprocessing unit power is wasted in processingsuch a high-quality stream, which in turnconsumes battery power.

Overcoming these problems requires that westore a single high-quality AV resource, adaptedon the fly according to the capabilities of therequesting device and the characteristics of thenetworks traversed. Furthermore, with tools—asspecified within the MPEG-21 DIA—we can con-sider user preferences as well as the natural envi-ronment, such as noise level or illumination.

Because the MPEG-21 DIA enables coding-for-mat independence, we can describe the structureof the AV resources using gBSD and formulatethe possible adaptation operations using AQoS.In some cases, we need to further constrain theuser environment, which can be accomplishedby a UCD, specifying—for instance—the appli-cation’s window size.

Figure 3 illustrates multimedia adaptation as aservice in intermediary network nodes. The con-sumer requests a resource from the provider,describing the usage context in the request. Theprovider redirects this request together with theresource (packaged into a DID) to an MPEG-21DIA-compliant adaptation service, which performsthe adaptation and forwards the adapted resourceto the consumer. Note that, according to MPEG-21, Digital Items are exchanged between theinvolved parties (a provider and a consumer). A

Digital Item comprises the request and the gBSD,AQoS, UED, and UCD information, and the adap-tation service provides the user (consumer) with aDigital Item containing metadata and resources.

Adapting images according to terminalcapabilities

A growing variety of mobile terminals willaccess multimedia resources of different formats.Especially in the Web community, the usage andrichness of such resources will continue toincrease.

Consider images as an example: They’realready an integral part of almost every Web page,including private ones, now that digital camerasare available at reasonable prices. Providing animage tailored for each access device, network,and codec would multiply the images’ storageand offline transcoding requirements. This solu-tion is clearly neither economically feasible nor“future proof” because future image formats can’tbe considered. Therefore, we propose to exploitscalable image formats such as JPEG2000 togeth-er with the MPEG-21 tools introduced earlier. Ouradaptation methodology—as Figures 1, 2, and 3show—results in several versions of the samehigh-quality image satisfying the differentdevices’ capabilities, which Figure 4 depicts.

In Figure 4, different mobile devices access thesame high-quality JPEG2000-encoded image andindicate their display capabilities by means ofMPEG-21 DIA descriptions. The optimal adapta-tion parameters for the images are provided by theassociated AQoS descriptions—that is, the adapta-tion decision-taking engine (ADTE) determines

77

January–M

arch 2005

Consumerrequest +UED/UCD

redirect+UED/UCD

+DID(AQoS, gBSD, Res)

Provider

response+adapted DID

(AQoS´, gBSD´, Res´)

Multimediaadaptation service

Exchange of Digital Items

User A User B

Figure 3. Multimedia

adaptation as a service

in intermediary nodes.

how many decomposition levels and color com-ponents need to be dropped, given the display’sspatial resolution and color properties. The actualbitstream adaptation is performed by means of thegBSD-based adaptation framework. (For addition-al information, see the “Further Reading” sidebar.)

Communication protocolsWe’ve thus far focused on device and coding-

format-independent bitstream adaptation. Inpractice, the information assets need to be com-municated among the participants in the MPEG-21 multimedia framework. MPEG-21 itselfdoesn’t provide any standardized protocols forthis task, but other standardization bodies likethe IETF or W3C do. Some protocols that can beused for this purpose follow.

Real-time Transport Protocol (RTP)This protocol is especially suitable for real-

time streaming of AV resources. Associated meta-

data could be transported together with theresource within the custom header in each RTPpacket. Alternatively, a separate RTP stream couldbe set up for the metadata, but this increases syn-chronization efforts at the involved nodes (seeRFC3550 for further details, http://www.ietf.org/rfc/rfc3550.txt).

Real-Time Streaming Protocol (RTSP)RTSP provides primitives (Setup, Play, Pause,

Describe, and so on) for negotiating AV resourcesover IP-based networks. RTSP often uses the Ses-sion Description Protocol (SDP, see RFC2327 forfurther details, http://www.ietf.org/rfc/rfc2327.txt)for providing information relevant to the currentsession, such as the current frame rate of a video orthe network bandwidth. This is, however, propri-etary and protocol-specific and doesn’t use MPEG-21 DIA UEDs, which are designed to be protocolindependent based on XML. To close this gap, thenext generation of the SDP (SDPng), which isbeing developed, is capable of transporting XMLand, therefore, UEDs during the negotiation phase(see RFC2326 for further details, http://www.ietf.org/rfc/rfc2326.txt).

Session Initiation Protocol (SIP)SIP is a text-based protocol for initiating inter-

active communication sessions between users.Such sessions include voice, video, chat, interac-tive games, and virtual reality. In this sense, it’ssimilar to SDP and could benefit from the SDPngdevelopment as well (see RFC3261 for furtherdetails, http://www.ietf.org/rfc/rfc3261.txt).

Hypertext Transfer Protocol (HTTP)The W3C’s protocol for the Web targets more

or less nonstreaming resources because of its TCP-based architecture. However, extensions to HTTPalso allow the inclusion of capabilities descrip-tions, which could be used to include UEDs with-in the resource request. Metadata associated withthe requesting resource can be handled like theresource itself (see RFC2616 for further details,http://www.ietf.org/rfc/rfc2616.txt).

Future workThe MPEG-21 DIA provides tools for guaran-

teeing interoperability for media adaptation pur-poses in the entire multimedia delivery chain.The main focus is on enabling device and coding-format-independent adaptation engines that var-ious industries and businesses can construct.Therefore, the interfaces to such adaptation

78

IEEE

Mul

tiM

edia

Standards

Figure 4. Different

mobile devices access

the same high-quality

JPEG2000-encoded

image resource, using

MPEG-21 concepts.

Further ReadingFor more information about MPEG-21,

including overviews, FAQs, and working docu-ments, see http://www.chiariglione.org/mpeg.

Additional MPEG-21 DIA material can befound at http://mpeg-21.itec.uni-klu.ac.at/.

engines have been standardized, and it’s up tothe industry to compete for implementationscompliant with the MPEG-21 DIA.

However, the MPEG-21 DIA doesn’t standard-ize any transport or messaging mechanismsenabling the communication of such descriptions.This task needs to be accomplished one level belowthe application layer on which MPEG-21 oper-ates—that is, by means of communication proto-cols as we’ve outlined. It’s conceivable thatbindings to the various protocols need to bedefined within other standardization bodies. Fur-thermore, the transport of media and associatedmetadata raises several synchronization issues thatneed to be solved; in doing so, researchers need toclearly keep efficiency in mind. In particular, meta-data’s significant size—in terms of file size—andXML’s verbosity need to be reduced. An alternative(binary) serialization of XML data is one steptoward this goal, and MPEG-7 Systems12 seems tobe a promising candidate. On the other hand, theW3C’s XML Binary Characterization WorkingGroup (http://w3.org/XML/Binary/) has recentlystarted investigating this issue but hasn’t produceda recommendation so far. MM

AcknowledgmentsThe authors wish to thank Jörg Heuer, Gabriel

Panis, and Andreas Hutter from Siemens AG; EricDelfosse from IMEC; Debargha Mukherjee fromHP Labs; Sylvain Devillers from France TelecomR&D; and Harald Kosch from Klagenfurt Univer-sity for contributing various parts to the stan-dards we’ve discussed.

References1. J. Smith, R. Mohan, and C.-S. Lee, “Scalable Multi-

media Delivery for Pervasive Computing,” Proc.

ACM Multimedia, ACM Press, 1999, pp. 131-140.

2. A. Vetro, C. Christopoulos, and T. Ebrahami, eds.,

IEEE Signal Processing Magazine, special issue on Uni-

versal Multimedia Access, vol. 20, no. 2, 2003.

3. F. Pereira and I. Burnett, “Universal Multimedia

Experiences for Tomorrow,” IEEE Signal Processing

Magazine, vol. 20, no. 2, 2003, pp. 63-73.

4. I. Burnett et al., “MPEG-21: Goals and

Achievements,” IEEE MultiMedia, vol. 10, no. 6,

Oct.–Dec. 2003, pp. 60-70.

5. A. Vetro and C. Timmerer, “Digital Item Adaptation:

Overview of Standardization and Research Activities,”

IEEE Trans. Multimedia, vol. 7, no. 2, April 2005.

6. A. Vetro, C. Timmerer, and S. Devillers, “Digital Item

Adaptation,” The MPEG-21 Book, R. Van de Walle et

al., eds., John Wiley & Sons, 2005 (to appear).

7. Information Technology—Multimedia Framework

(MPEG-21)—Part 7: Digital Item Adaptation, ISO/IEC

21000-7, 2004. For information, see http://www.

iso.org/; also http://www.chiariglione.org/mpeg.

8. F. De Keukelaere and R. Van de Walle, “Digital Item

Declaration and Identification,” The MPEG-21 Book,

R. Van de Walle et al., eds., John Wiley & Sons,

2005 (to appear).

9. G. Panis et al., “Bitstream Syntax Description: A Tool

for Multimedia Resource Adaptation within MPEG-

21,” EURASIP Signal Processing: Image Communica-

tion J., vol. 18, no. 8, 2003, pp. 721-747.

10. C. Timmerer et al., “Coding Format Independent

Multimedia Content Adaptation Using XML,” Proc.

SPIE Int’l Symp. (ITCom 03), vol. 5242, SPIE, 2003,

pp. 92-103.

11. D. Mukherjee et al., “Terminal and Network Quality

of Service,” IEEE Trans. Multimedia, vol. 7, no. 2,

April 2005.

12. J. Heuer, C. Thienot, and M. Wollborn, “Binary For-

mat,” Introduction to MPEG-7: Multimedia Content

Description Language, B.S. Manjunath, P. Salembier,

and T. Sikora, eds., John Wiley & Sons, 2002.

Readers may contact Christian Timmerer and Hermann

Hellwagner at {christian.timmerer, hermann.hellwagner}@

itec.uni-klu.ac.at.

Contact Standards editor John R. Smith at

[email protected].

79

January–M

arch 2005

Acronyms ADTE Adaptation decision-taking engineAQoS Adaptation QoSBSD Bitstream syntax descriptionDIA Digital Item AdaptationDID Digital Item DeclarationgBSD generic Bitstream Syntax DescriptiongBSDtoBin generic-Bitstream-Syntax-Description-to-BinaryIETF Internet Engineering Task ForceMPEG Moving Picture Experts GroupQoS Quality of serviceSDP Session Description ProtocolSDPng Session Description Protocol, next generationUCD Universal constraints descriptionUED Usage environment descriptionUMA Universal Multimedia AccessW3C World Wide Web ConsortiumXML Extensible Markup LanguageXSLT Extensible Stylesheet Language Transformation

interoperable adaptive multimedia communication

Documents