service compatible efï¬cient 3d-hdtv delivery

Service compatible efficient 3D-HDTV DeliveryManuel Gorius, Jochen Miroll, Thorsten Herfet

Telecommunications LabSaarland University

Saarbrücken, Germany{gorius, miroll, herfet}@nt.uni-saarland.de

Abstract—3D-HDTV is considered to be the next milestone

in the evolution of digital video storage and transmission.

Providing two views of a scene instead of just one may only be

the beginning of the era of multi-view video services. Yet, the

major challenge in designing multi-view transport and rendering

architectures is to prevent them from scaling linearly in resource

consumption. With respect to network usage, real-time media

distribution can benefit from novel, scalable transport protocols

providing Predictable Reliability under Predictable Delivery

time (PRPD). This paper proposes a comprehensive architecture

under this paradigm for the efficient, service compatible

delivery and rendering of 3D-HDTV via Internet Protocol. The

system provides high flexibility for different distribution and

representation strategies of 3D or even multi-view video content

and offers well-defined backward compatibility to conventional

2D receiver designs.

Keywords: 3D-HDTV, Future 3D Internet, Service Compatible

3D, Predictable Reliability/Predictable Delay

I. INTRODUCTION

Recent stereoscopic movie productions raised the popularityof 3D content in cinema significantly. For the latest releasesthe 3D version produced more than half of the revenue [5].At the same time hardware manufacturers are at the edgeof turning the 3D-TV technology into an affordable mediumfor home entertainment: Maximum refresh rates of flat-screenpanels are increasing towards the mid 3-digit range, enablinginterpolation techniques for a smoother picture rendering. As anice side effect, those displays are certified to be “3D-ready” ifthey include an HDMI 1.4 connector, which provides sufficientbandwidth to deliver a 3D-HDTV signal to the display.

Along with the dimensional enhancement TV as such issupposed to become “smarter” in near future. A strong conver-gence with high-end video games as well as added-value webservices is considered to enrich the traditional broadcast ap-plication, unveiling new demands on the rendering hardware1.Additionally, a clear trend towards the deployment of GeneralPurpose GPUs (GPGPU) in HD video decoding and post-processing is visible23. Foresighted over-provisioning in thehardware specification might be the enabler for stereoscopicvideo experience: The chipsets as well as the corresponding

1http://www.intel.com/inside/smarttv/2Video Acceleration API (VA-API),

http://www.freedesktop.org/wiki/Software/vaapi3Video Decoding and Presentation API for Unix (VDPAU),

http://http.download.nvidia.com/XFree86/vdpau/doxygen/html/index.html

APIs support the decoding, post-processing, and rendering ofat least two HDTV streams simultaneously.

It is hardly predictable whether 3D-HDTV is going toaddress a mass market or whether it will end up in a nicheservice. Flexibility in transport and delivery strategies in orderto keep the service distribution scalable is thus desirable.The stereoscopic source coding “toolbox”, considering workconducted in this field, contains formats such as the MPEGMultiview Coding (MVC) [8] and the “2D + depth map”representation [3] leading to roughly 50% overhead for a3D service, ensuring the possibility of rendering only the 2Dimage and ignoring the stereo enhancement information.

The Internet Protocol (IP) is the exclusive provider of smart,interactive, 2D TV services and it will be an essential optionfor 3D-HDTV. Via HTML54 and XML3D5 the audio-visualentertainment and the World Wide Web environment converge.However, substantial innovation is required in the packetizedtransport of audio-visual and interactive media. A predictabledelivery time is mandatory for those applications, whereasa certain amount of residual packet loss does not affect theviewing experience. Therefore, we propose a novel protocoloperating at Predictable Reliability under Predictable Delay(PRPD) for 3D-HDTV. Driven by a well-founded statisticalchannel analysis it is a valuable basis for smart overlaymedia distribution strategies that optimize the overall networkutilization in the unmanaged Internet.

This paper proposes a comprehensive architecture for ser-vice compatible delivery and efficient rendering of 3D-HDTVvia IP networks. It is organized as follows: The overallsystem architecture is introduced in Section 2, which alsoaddresses media synchronization issues in the hierarchicaldelivery scheme. Section 3 focusses on the benefit of GPGPU-based decoding and rendering of 3D video. Section 4 givesa short-term outlook on potential applications based on theproposed architecture.

II. SERVICE COMPATIBLE TRANSPORT ARCHITECTURE

Predictions about the impact of 3D-HDTV in the livemedia broadcast are still rather vague. Current assumptionsformulate a strong dependence between the type of contentand the consumer’s preference to experience it in 3D. For IP-based delivery, bottlenecks in network bandwidth might, at

4http://www.w3.org/TR/html5/5http://graphics.cs.uni-sb.de/489/

any time, require the service to be downgraded to a singleview. Obviously there is a strong requirement for seamlesscoexistence of 2D and 3D services that should be served bya flexible transport strategy.

A. Seamless Backward Compatibility

In our opinion, a seamless and transparent backward com-patibility is essential for the consumer acceptance of 3D-HDTV. Switching between the 2D and the 3D-enhancedversion of a service has to be offered without interrupting theservice. The proposed architecture (cf. figure 1) provides therequired flexibility in terms of the delivery strategy as well asthe decoding facilities. Due to the bidirectional features of theIP network, 3D experience can be requested and cancelled ondemand.

This flexibility is achieved by separate transportation ofthe monocular and binocular data using independent packetstreams emitted by the same, or potentially emitted by differentsources. An efficient setup should comprise an ordinary 2Drepresentation of the video as well as an ancillary 3D enhance-ment stream. Using advanced source coding methods such asMVC [8], but explicitly provided via at least two individualstreams, the complete set of views may still be transmittedwith an overhead of significantly less than 50% as comparedto the 2D video bandwidth.

B. Optimal Internet Transport

The broad availability of the Internet Protocol (IP) on var-ious platforms and architectures leads the media broadcastersto consider it as a reasonable alternative to the establisheddelivery networks. Media distribution over unmanaged Internetrecently became a serious topic within the DVB group [2].However, available protocols as such turn out to be hardlysuitable for the requirements of live media broadcast: WhereasTCP does not offer predictability of delivery time, UDP, evenin combination with different RTP profiles, is not able to serveefficient and sufficient reliability for the dynamic behavior ofheterogeneous IP networks [10], [7].

We propose a media oriented transport protocol operatingaccording to the PRPD paradigm, i.e. providing predictablereliability under predictable delay specifically according to theapplication’s requirements [4]. The core of our novel protocolis an Adaptive Hybrid Error Correction (AHEC) scheme [11].Adaptive channel coding consequently uses the bidirectionalcharacteristic of available IP networks. The highly flexiblecomposition of NACK-based ARQ and adaptive packet-levelFEC leads to near-optimal coding efficiency in any scenario.The scheme is controlled by analytical parameter derivationbased on a statistical channel model fed with timely receiverfeedback. The mentioned transport protocol has been imple-mented into a linux kernel module6.

In many scenarios, said enhancement stream(s) for 3D-HDTV might be considered optional. The MVC profile forRTP specifies the assignment of different priority values in

6http://www.nt.uni-saarland.de/projects/prrt/

the NAL header of different views. RFC 3984 specifies MediaAware Network Elements (MANE) [12] that perform a for-warding decision for each RTP packet depending on its priorityand the congestion state in the network. The priority featurewas also used similarly in Scalable Video Coding (SVC) inorder to apply Unequal Error Protection (UEP) [9] schemesto the enhancement layers.

The aspect of predictable reliability in our protocol ishighly beneficial in this regard: The expected residual packeterasure rate is an input parameter of our protocol stack. Ifthe different enhancement views are transported in differentprotocol sessions, it is simple to discriminate them duringtransmission, and via unequal protection, less coding overheadfor streams of lower priority is achieved.

C. Overlay Transport Reliability

For our media-oriented protocol the ability to fit to certaindelay and reliability constraints even allows optimization oferror correction parameters other than for end-to-end connec-tions [4]: e.g. wired and wireless networks differ significantlyin terms of packet loss characteristics. In addition, wirelessnetwork segments usually operate at much lower round tripdelay because of their limited range. Obviously, pure end-to-end error correction schemes are not efficient in suchheterogeneous network environments. Therefore, our AHECscheme offers a link-level operation mode, which relievesreliable link segments from the coding overhead required forcomparably less reliable links on this path. A virtual linkdifferentiation provides opportunities for smart segmentationof the delivery network into separate error correction domains,subject to the application’s overall time and reliability budget.

IP multicast is a valuable basis for flexible, efficient, andscalable data distribution to large receiver groups. Actually,two considerations justify the design of a fine-grained grouphierarchy:

• For a significant amount of time the consumption of 3D-HDTV may be rather the exception. The 3D enhancementshould only be received upon joining an additional mul-ticast group.

• Each multicast group in the hierarchy sets “natural” limitsfor the error correction domain, i.e. it enables the localerror correction.

As usual for overlay architectures the proposed approach shiftsworkload from end hosts to smart intermediate network nodes.Nevertheless, there is an evolutionary characteristic in end-to-end-optimal to virtual-link-optimal transition, due to whichit is neither required nor immediately feasible to upgrade allnetwork nodes at once. It has been shown that there is a certainsaturation point limiting the number of segmentations into sep-arate error correction domains [6]. However, the deploymentof a smart intermediate node is generally sensible wherever thenetwork architectures or characteristics are inhomogeneous.

D. Inter-stream Synchronization

Real-time multimedia rendering in broadcast quality relieson properly synchronized end devices. It is obvious that

Figure 1. Proposed architecture: Base view and enhancement stream may origin from different synchronized sources. The AHEC transport stack providesthe required reliability. We use libxine (http://www.xine-project.org) for simultaneously decoding both views. E.g. an Intel Core i5 CPU with integrated GPUoffers the H.264 acceleration. The OpenGL mixer back-end provides various stereo output formats. (Pictures from “Skull Rock” by Benjamin Smith)

in a typical one-to-many live media broadcast scenario thestreaming source has to be the anchor for any timing since itdetermines the play-out speed. In case of our multi-stream aswell as multi-source 3D-HDTV broadcast environment the sit-uation is significantly more challenging. The synchronizationalgorithm has to tackle differences in the propagation delaygiven that there is route diversity. Moreover, a perfect re-assembly of adjacent images, or images and correspondingenhancement frames, respectively, is necessary.

The Program Clock Reference in the MTS is a widelydeployed method to lock a group of receivers to a singlereference clock [1]. Traditionally, receiver devices comprisea digital phase locked loop (DPLL) controlling an adjustableoscillator. Receiving the sender’s system clock samples via thePCR in sufficiently short intervals along with the data streamenables this circuit to perform precise clock recovery.

We propose an end-to-end synchronization mechanism uti-lizing the MPEG-2 PCR in a master-slave fashion (cf. figure1). The provider of the 2D base view is the basic clock source.Any play-out server providing an additional view is locked toits master clock. Since all views are assumed to have equalframe rates, data are sent into the network depending on theframes’ presentation timestamps (PTS).

The receiver is supposed to lock to the PCR of the baseview. It requires a sufficiently stable DPLL (or software PLL)in order to remove network jitter from the incoming referenceclock signal.

In order to align the second or any additional view to thebase view, sufficient buffer space has to be allocated at thereceiver in order to overcome the difference in propagationdelay between all streams. Dynamic FIFO queues adapt au-tomatically in terms of storage space. The recovered clockis again shared between the base view and the enhancementviews in a master-slave model. Finally, all decoders have tobe enabled to recognize adjacent frames from several views

by comparing their PTS to the same system clock phase.

III. 3D DECODING AND PRESENTATION

Since source coding, post-processing, and rendering ofmultimedia data makes massive use of linear algebra, it isa sensible step to shift the related algorithms from the CPUto the graphics unit that was actually designed to handle largevectors and textures. In the following sections we propose aGPGPU-based decoding and presentation pipeline for stereoor even multi-view video (cf. figure 1).

A. GPGPU Stereo Decoding

Leading designers of computer graphics hardware answeredthe demands for GPU based multimedia processing by devel-oping specialized programming APIs on top of their GeneralPurpose Graphics Units (GPGPU) or even using dedicated co-processing units. In recent versions they allow the programmerto feed raw source coded video (e. g. H.264 slices) frame byframe into their GPU-based decoder. The decoders operateentirely stateless, which makes them highly flexible:

A software wrapper is the only thing to be implementedby the application developer, caring for the peripheral func-tionality of decoding, such as stream parsing and decodedpicture buffer management. After applying the actual decodingroutines to a picture frame, the decoded image data reside on aso called surface in video memory. All post-processing stepsare performed at the GPU. Finally, a dedicated presentationqueue implemented within the graphics chipset ensures thatuncompressed pictures stay in the graphics pipeline, reducingmain memory bandwidth usage. The presentation queue isdriven by a high-resolution timer and provides access to thevertical synchronization of the display.

Fortunately, the media processing facilities of current GPG-PUs are fairly over-provisioned: On all of our test systemsit was possible to decode two 1080p25 AVC HDTV streams

simultaneously, which corresponds to the worst case 3D setupwith 100% overhead (left and right image simulcast). In fact,it enables their deployment in various stereo decoding setups.Due to the stateless behavior of the decoder API, implementingMVC support is deemed possible without major effort sincemodifications are necessary only in the external softwarewrapper.

B. OpenGL Mixing Backend

Unfortunately, the exchange of surface data between twoindependent decoder pipelines is not yet specified in the avail-able APIs. Therefore, we have chosen OpenGL7 for mergingthe uncompressed image data of the different views into asingle output to the display. The presentation queues of theGPGPU decoders are able to render the decoded pictures intoan OpenGL texture. This process avoids CPU usage for theexpensive copying of uncompressed video between differentmemory domains.

Being a platform-independent graphics API, OpenGL pro-vides a collection of useful features for stereo video rendering.Via efficient texture operations it is possible to offer anydisplay, frame or even HD frame compatible output such as:color anaglyph, top/bottom or side-by-side, line-interleaved,time-interleaved.

The OpenGL API gives access to trigger shutter glassesin case of time-interleaved active stereo representation. Themixing back-end is either fed by full images of the differentviews or it takes a 2D texture and the corresponding depthmap as an input.

IV. CONCLUSION & OUTLOOK

The Future Internet will be faced with a significantlyincreased amount of multimedia data. This fact is obvious forupcoming multi-view video services, but surely not limitedto those applications. IP networks are expected to enrich oreven replace the classical media broadcast in certain areas.Efficiency and scalability of content delivery therefore will bea serious issue. So far, however, solutions still widely consistof over-provisioning of hardware and infrastructure. It mayjust be a question of time until this method cannot catch upwith the increased user demands anymore.

At the same time, services also gain complexity. Beyondthe rendering of 3D-HDTV, the convergence of classical TVwith web-based entertainment and high quality 3D gaming intosmart TV platforms is on the roadmap of the leading consumerelectronics manufacturers. Here, the challenge is mainly tokeep power consumption low, even though the applicationsbecome more complex.

In this paper we proposed an overall system for servicecompatible transmission and rendering of 3D-HDTV. The ar-chitecture is based on a novel transport paradigm, which suitsthe transport requirements of audiovisual data. The focus is seton the delivery of such data with predictable reliability underpredictable delay. The support of multicast and the definition

7http://www.opengl.org/

of smart overlay structures make it a scalable and efficientway to deliver high data rate multimedia content in real-time.Decoding and rendering in the end devices makes use ofGPGPU architectures, which are supposed to be integratedinto upcoming smart consumer electronics platforms. Theircapability of decoding multiple HDTV streams simultaneouslyas well as their efficient handling of large textures makes thema valuable basis for a flexible and power-saving multi-viewrendering architecture.

Our architecture is supposed to offer efficient delivery andrendering of 3D-HDTV content over IP-based infrastructure.But it is by far not restricted to this scenario. 3D services willhave a high impact in web-based entertainment. They couldbe a reasonable enrichment for various applications all overthe web including marketing pages, social networks, remotegames as well as messenger services.

The specification of XML3D8 enables the seamless render-ing of 3D scenes into HTML pages. However, it relies on thegraphics performance of the receiver device. Especially formost mobile devices AVC video decoding is already a commonfeature, rather than high graphics performance. Server-basedrendering could be a solution to serve those devices with highquality 3D content as well. Furthermore, provision of any kindof enhancement via separate streams enables separate billingof the respectively different services, as e.g. 2D and 3D.

ACKNOWLEDGMENT

The presented research results have been obtained withfinancial support of the Intel Visual Computing Institute Saar-brücken.

REFERENCES

[1] ISO/IEC 13818-1. Generic coding of moving pictures and associatedaudio information: Systems. Technical report, 2000.

[2] DVB Study Mission 2009. Digital Video Broadcasting ( DVB ): InternetTV Content Delivery Study Mission Report DVB Document A 145December 2009. Distribution, (December), 2009.

[3] Gobert J. Bruls F. Bourge, A. Mpeg-c part 3: Enabling the introductionof video plus depth contents. Content generation and coding for 3D-

television workshop, 2006.[4] Manuel Gorius, Michael Karl, and Thorsten Herfet. Optimized Link-

level Error Correction for Internet Media Transmission. In Optimization,Saint-Malo, 2009.

[5] Charlotte Jones. The business case for 3d theatrical - statistics andtrends. 3D Summit 2009, Los Angeles, September 2009.

[6] Michael Karl, Manuel Gorius, and Thorsten Herfet. Routing: Why lessintelligence sometimes is more clever. 2010 IEEE International Sym-

posium on Broadband Multimedia Systems and Broadcasting (BMSB),pages 1–6, March 2010.

[7] A Li. RTP Payload Format for Generic Forward Error Correction. 2007.[8] ISO/IEC JTC 1/SC 29/WG 11 N9965. Coding of moving pictures and

audio: Mpeg-4 multi-view coding. Technical report, 2008.[9] Hashemi M.R. Fatemi O. Naghdinezhad, A. A novel adaptive unequal

error protection method for scalable video over wireless networks. IEEE

Inter. Symp. on Consumer Electronics, page June, 2007.[10] J. Ott, S. Wenger, N. Sato, C. Burmeister, and J. Rey. Extended

RTP Profile for Real-time Transport Control Protocol (RTCP)-BasedFeedback (RTP/AVPF). 2006.

[11] Herfet T. Tan, G. On the architecture of erasure error recovery understrict delay constraints. European Wireless (EW2008), page June, 2008.

[12] Hannuksela M.M. Stockhammer T. Westerlund M. Singer D. Wenger,S. RTP payload format for h.264 video. RFC 3984, 2005.

8http://graphics.cs.uni-sb.de/489/

service compatible efï¬cient 3d-hdtv delivery

Documents