performance evaluation of wavelet-based distributed video coding schemes

SIViP (2011) 5:49–60DOI 10.1007/s11760-009-0141-4

ORIGINAL PAPER

Performance evaluation of wavelet-based distributedvideo coding schemes

Riccardo Bernardini · Roberto Rinaldo ·Andrea Vitali · Pamela Zontone

Received: 23 February 2009 / Revised: 30 September 2009 / Accepted: 30 September 2009 / Published online: 23 October 2009© Springer-Verlag London Limited 2009

Abstract In this paper we propose and compare differentdistributed video coding (DVC) schemes based on the useof the wavelet transform, which naturally allows for spatialand other forms of scalability. In particular, we propose ahybrid encoder which utilizes channel codes, and evaluateits performance in the absence of a feedback channel. Theproposed scheme uses statistical models for the estimation ofthe required bitrate at the encoder. We also propose a schemethat is based on a modulo reduction procedure and does notuse channel codes at the receiver/transmitter. These schemesare compared with more conventional coders that do not oronly partially exploit the distributed coding paradigm. Exper-imental results show that the considered schemes have goodperformance when compared with similar asymmetric videocompression schemes, and that DVC can be an interestingoption in appropriate scenarios.

Keywords Distributed video coding · Error correctingcodes · Wavelet transform · Modulo reduction

1 Introduction

Distributed source coding (DSC) refers to the compressionof multiple correlated sources that do not communicate witheach other. These sources send their compressed outputs to acommon decoder that performs joint decoding. The encoders

R. Bernardini · R. Rinaldo (B) · P. ZontoneDipartimento di Ingegneria Elettrica, Gestionale e Meccanica,Università degli Studi di Udine, Via delle Scienze 208,33100 Udine, Italye-mail: [email protected]

A. VitaliST Microelectronics, via C. Olivetti 2, 20041 Agrate Brianza, Italy

can compress without loss the sources by using a rate thatis no less than the entropy of the sources, and this is an evi-dent loss of efficiency with respect to an encoder that jointlycompresses the sources, since in this case a bitrate equal tothe joint entropy is sufficient to code the correlated sources.In this situation, the challenging problem is to achieve thesame efficiency of a joint coder without requiring that thesources communicate with each other. Let {(Xi , Yi )} be asequence of independent and identically distributed draw-ings of a pair of correlated discrete random variables X andY . In [1], Slepian and Wolf showed that R = H(X, Y ), i.e.,the minimum rate required when X and Y are coded jointly,is also sufficient when X and Y are coded in a distributed sce-nario, where the coders for X and Y operate separately. Thisremarkable result of Information Theory has been extendedto lossy source coding by Wyner and Ziv (WZ) in [2].

Recently, several schemes based on the DSC principlehave been proposed for distributed video coding (DVC).Comprehensive surveys of early DVC schemes and recentdevelopments are presented in [3,4]. As originally proposedby Wyner, these schemes are based on the use of channelcodes [5–7].

Two basic architectures have been proposed for DVC.In [8], Pradhan and Ramchandran presented a syndrome-based coding/decoding procedure, successively extended to avideo coding system in [9]. In particular, the encoder isbased on frame differences and classifies image blocks asuncoded, intra-coded, and WZ coded. In the last case, thelow-frequency coefficients of the DCT-transformed block arequantized and encoded using a trellis channel code. At thereceiver, different motion-compensated reference blocks canbe used as side information to recover the WZ block on thebasis of the trellis code bits.

The DVC architecture considered in this paper is basedon the solution proposed in [10] and developed in [11]. In

123

50 SIViP (2011) 5:49–60

particular, in [10], Aaron et al. apply a Wyner–Ziv coding tothe pixel values of a video sequence. The reference scheme of[10], with two separate coders and a joint decoder, assumesthat the video sequence is divided into key frames (i.e., theodd frames of the sequence), and Wyner–Ziv (WZ) frames(the even frames). The Authors suppose that the original keyframes are available at the receiver. The decoder computes, byinterpolating the key frames, a prediction of the WZ framesthat will be used as side-information in the distributed cod-ing paradigm. The bitplanes of the pixels of the side infor-mation frames are modeled as a corrupted noisy version ofthe corresponding bitplanes of the WZ frames. Thus, the WZframes can be compressed without access to the key framesbut only with knowledge of the joint distribution between theWZ and the key frames, by sending the parity bits of a sys-tematic turbo coder. The joint decoder will use these paritybits to correct the bits in which the WZ and the side informa-tion differ (see Fig. 1). By modelling the correlation betweenlength K bitplanes Y K , and X K as described by a memory-less binary symmetric channel with cross-over probability pand input Y K , optimal coding/decoding can be performed byusing an ideal channel coder achieving capacity. In this case,the redundancy rate to recover X K from Y K , with vanishingerror probability as K increases, would be equal to H(p) =−p log2 p − (1 − p) log2(1 − p) = H(X |Y ) bit/symbol. Ifwe describe the side information Y K with H(Y ) bit/symbol,the described distributed scheme would require H(X, Y ) =H(Y ) + H(X |Y ) bit/symbol, as predicted by the Slepian–Wolf result.

Such an approach is extended to the transform domainin [11], where the bitplanes of the Discrete Cosine Trans-form (DCT) of the WZ frames are compressed as describedearlier. The DCT transform enables the coder to exploit thestatistical dependencies within a frame and so better rate-distortion performance can be achieved. In [10,11] simpleframe interpolation or extrapolation is used to compute theside information. In [12] a motion-compensated temporalfiltering is used to improve the quality of the side informa-tion. In [13] Ascenso et al. consider a motion-compensatedrefinement method for Pixel Domain DSC schemes; in [14]a spatial motion smoothing algorithm, using a weighted

Turbo Decoder

Fig. 1 DVC: 1. The WZ coder transmits the parity bits for the bitplaneX K ; 2. The decoder computes the side information, i.e., a noisy versionY K of X K , and recovers X K using a systematic turbo decoder withinput Y K and the parity bits

median vector filter, is also proposed to refine the sideinformation.

The algorithms to generate the side information at thedecoder influence significantly the rate-distortion perfor-mance of the Wyner–Ziv video coding schemes. Thetechniques described in [14,15] were also selected for theDISCOVER mono-view codec [16]. The architecture of thiscodec is based on the scheme proposed in [11], but manyimprovements have been added in order to enhance the per-formance of the basic building blocks. However, as in theoriginal scheme, a feedback channel is still used to requestmore parity bits until the decoder reconstruction is success-ful.

Other possible schemes, derived from the one describedin [10], have been presented. In [17] the pixels of a frame aredivided into two sub frames: the key sub frame, consistingof the odd vertical pixel lines, is conventionally encoded andit is used at the decoder to compute the side information thatwill be used to reconstruct the Wyner–Ziv sub frame (the evenvertical pixel lines of the original frame). In [18] Tagliasacchiet al. propose another WZ sub frame coding scheme. The WZframes are split into two parts: the first part is decoded usingthe side information only (obtained from the key frames). Thesecond part is instead decoded using the side information andthe previously decoded WZ sub frame.

Wavelet-based coding schemes have the potential advan-tage to naturally allow multiresolution and embedded coding.A wavelet domain DVC scheme has been proposed in [19].The authors use a pair of lattice vector quantizers (LVQ)to subtract the dependence between wavelets coefficients.They also extend the motion compensation refinement con-cept of pixel domain to wavelet domain and propose a newsearch strategy for vector reconstruction. In [20], a waveletdomain DVC scheme based on the zero-tree entropy (ZTE)coding is then presented. The wavelet coefficients are quan-tized using scalar quantization and reorganized in terms ofthe zero tree structure, in order to identify the significant andinsignificant coefficients. Only the significant coefficientsare encoded with a turbo coder and the punctured parity bitare transmitted. In [21], the authors exploit the multiresolu-tion properties of the wavelet decomposition to refine motionestimation at the receiver, in order to improve the quality ofthe side information.

In many implementations, the rate allocated to the WZframes is adapted by puncturing with different patterns theoutput of a turbo code. Recently, LDPC-based rate-adaptivecodes with accumulated syndromes have been proposed [22].In [11,19,20,23], the authors suppose that the decoder canrequest, via a feedback channel, additional bits until turbo-decoding gives correct reconstruction, within a small proba-bility of error. However, the use of a feedback channel maynot be convenient, for instance, in interactive applications ormulticast transmission.

123

SIViP (2011) 5:49–60 51

In this paper we substantially extend the work [24], includ-ing a detailed analysis of rate estimation and distortion per-formance, as well as a comprehensive set of experiments.

The main contributions and innovative aspects of thispaper are the following. First, we analyze and propose ahybrid DVC scheme based on the wavelet transform and theuse of channel codes at the encoder, but we do not use afeedback channel, as opposed to the majority of the schemesproposed in the literature. We discuss and validate, by meansof experimental results, the use of statistical models to adaptthe WZ bitrate at the encoder, and we explicitly evaluate theperformance loss deriving from the absence of the feedbackchannel. Note that, if feedback from the decoder is allowed,the proposed wavelet domain scheme has comparable orbetter performance than the reference schemes describedin [10,11].

Second, we propose a scheme based on a modulo reduc-tion procedure of the wavelet-coefficients before quantiza-tion and transmission, with joint decoding at the receiver.This scheme does not use channel codes nor a feedback chan-nel, and has comparable results to those obtained with thescheme based on channel codes, with a great decoder simpli-fication. As a matter of fact, this solution does not require theiterative decoding procedure necessary with turbo or LDPCchannel codes. When the modulo reduction parameter Mis large enough, this scheme reduces to intra-coding withmaximum likelihood estimation at the decoder. The moduloreduction scheme is similar in spirit to the one presented in[25], where scalar coset codes are adopted. For a quantizationstep �, we observe that the procedure in [25] is equivalentto set the modulo parameter to be equal to M = 2k�, sothat it is constrained by the number k of transmitted bits foreach pixel. In our scheme, the modulo reduction parameterM can be chosen arbitrarily and can be derived by taking intoaccount the expected distortion of the information frame atthe decoder (see Sect. 3.2).

Finally, we compare the rate-distortion performance ofthe schemes considered above with more conventional cod-ers that do not or only partially exploit the distributed codingparadigm.

In the following, we consider two scenarios (see Fig. 2). Inthe first, the WZ frames are encoded independently of the keyframes, and the key frames are encoded and decoded usinga conventional intraframe codec. This is the original frame-work considered for Wyner–Ziv coding, e.g., in [10,11]. Inthe second scenario, all frames (key frames and WZ frames)are available at the encoder. This scenario is interesting forthe design of a low-complexity video coder, with no motioncompensation, and where half of the frames (the WZ frames)are coded using DSC techniques. This framework is consid-ered, for example, in [26]. Note that, when all the framesare available at the encoder, the use of H.263/H.264 coding,with motion compensation, outperforms the DVC schemes

considered here, as already observed [10,11,16] (and as wewill see in the experimental results section).

The paper is organized as follows: in Sect. 2 we describethe proposed wavelet-based hybrid DVC scheme and ana-lyze the statistical models for bitrate adaptation. In Sect. 3 weanalyze in detail the modulo reduction scheme, preliminarilyproposed in [27], and introduce here a simple procedure tocalculate the appropriate modulo parameter M at the trans-mitter. Experimental results are presented in Sect. 4. Finally,in Sect. 5, we draw the conclusions.

2 Wavelet domain DVC

In the following, we will describe the procedures that we useto code the WZ frames, since we suppose that the decodedkey frames are available at the receiver. As we will explain inthe experimental section, the key frames are actually codedin intra-mode.

2.1 Wyner–Ziv wavelet domain scheme

This scheme is directly inspired by the one described in [11]and constitutes the basis of the proposed DVC proceduresthat we will consider later.

We operate on the Wavelet Transform of the WZ frames.A three-level, ten-band wavelet transform is considered forQCIF sequences, as shown in Fig. 3a. The spatial trans-form enables the coder to exploit the statistical dependencieswithin a frame, and so better rate-distortion performance canbe obtained. At the encoder, the wavelet transform coeffi-cients are grouped together to form coefficient subbands.Each subband is then quantized using a midtread uniformquantizer where the quantization step is set to be equal forall the subbands (this is the optimal solution for orthogo-nal transforms). Bits are assigned according to a modifiedsign/module labeling procedure. As an example, Fig. 4 showsthe intervals and label assignment for a three bit quantizer.

For each subband, the bitplanes are then independentlycoded using a Rate Compatible Punctured Turbo (RCPT)coder (as we will see, similar results are obtained using LDPCcodes). In particular, we transmit, for each bitplane, the cor-responding parity bits.

At the decoder, the side information is generated from thekey frames using temporal interpolation based on MotionCompensated (MC) interpolation with symmetric motionvectors [28]. As explained earlier, the parity bits are usedto recover the bitplanes of the wavelet transform of the WZframes from those of the side-information. Puncturing allowsto transmit a subset of the parity bits generated at the encoder.

To assess the performance of the schemes we proposelater, the scheme described in this section uses a feedbackchannel as in [11,23]. In particular, the decoder is allowed to

123

52 SIViP (2011) 5:49–60

Fig. 2 The consideredscenarios

Encoder

EncoderIntra

WZ frames

Key frames

WZ frames +Key frames

Low-complexityEncoder

Encoder

EncoderIntra

WZ frames

Key frames

scenario 1 scenario 2

WZ WZ

Fig. 3 a Index of the waveletsubbands. b Bits required foreach bitplane of all subbands forone frame of the Teeny sequence 10

324

65

7

8 9

0 5 10 15 20 25 30 35 40 450

500

1000

1500

2000

2500

3000

Bitplane index

Bits

H(p)H(x

k|x

k−1,..., x

1, Y)

Bits required via the feedback channel

987

6210 3 4 5

(a)

(b)

Fig. 4 Intervals and label assignment for a three bit quantizer

request additional parity bits until correct decoding is possi-ble (within a small probability of error). While the originalscheme of [11] did not consider the problem of a practi-cal implementation of this procedure, the problem has beenaddressed, for instance, in [23,29]. In our implementation,we consider the transmission of a 16-bit CRC code for eachbitplane. If the transmitted CRC does not match with thedecoded bitplane, the decoder requests additional parity bitsfrom the encoder buffer until the reconstructed bitplanematches the CRC and the decoding is declared to besuccessful.

As in [11], the iterative turbo decoder uses informationabout already decoded bitplanes to improve a-priori knowl-edge while decoding the next bitplane. Moreover, since theWZ frames are typically quantized more coarsely than thekey frames, the decoder implements a Maximum Likeli-hood reconstruction strategy, where the WZ wavelet coeffi-cient is reconstructed as the value in the quantization interval

which is closest to the value of the side-information. Thescheme considered in this section has performance similar tothe one of [11], with the possible advantage that the use ofthe wavelet transform naturally allows for various forms ofscalability.

2.2 Hybrid wavelet domain Wyner–Ziv scheme with rateestimation

The scheme proposed in this section does not use a feedbackchannel, and includes a procedure to estimate the requiredbitrate for WZ frames at the encoder. As a matter of fact, sincethe decoder cannot make requests to the WZ encoder, it isnecessary that the latter estimates the required parity bits foreach wavelet coefficient bitplane. Therefore, at the encoderwe need to estimate, for each subband, the rate required totransmit the WZ frame. In this section we explain how suchan estimation can be carried out. We will consider two pos-sible approaches to the estimation of the required rate: thefirst one is based on a Laplacian model and the second oneon a symmetric binary channel model. After describing thetwo approaches we will discuss their efficiency in predictingthe required bitrate.

123

SIViP (2011) 5:49–60 53

2.2.1 Laplacian model

We will model the subband s of frame t (both in WZ andside information frames) as a sequence of iid random vari-ables. Let Xn be the random variable associated with the n-thwavelet coefficient of subband s and frame t in the WZ frameand let Y n be the corresponding wavelet coefficient in the sideinformation frame. We will suppose that

Xn = Y n + en (1)

with en independent of Y n . A very common model for en

(see, for instance, [11]) assumes a Laplacian distribution,1

namely,

fe(a) = α

2e−α|a|. (2)

We will denote with XnQ = Q[Xn] the Q-bit quantized ver-

sion of Xn and with xnk the k-th bit of the binary word asso-

ciated with XnQ (with the convention that the first bit is the

most significant one).Since the coefficients are supposed to be identically dis-

tributed, in the following we will simplify the notation byomitting the superscript (e.g., by writing X , Y , xk or e insteadof Xn , Y n , xn

k and en) unless it is necessary to express explic-itly the coefficient index (e.g., as in (4) and (5) below, wherethe sample average of a quantity associated with the subbandcoefficients is computed).

The Wyner–Ziv framework suggests to estimate the bitraterequired to transmit X Q as the conditional entropy H(X Q |Y ).We have

H(X Q |Y ) =Q∑

k=1

H(xk |xk−1, . . . , x1, Y ).

Since the WZ encoder uses a channel code operating on abitplane by bitplane level, the required rate for the k-th bit-plane, starting from the most significant bit x1, can thereforebe chosen as (see [30] for a similar approach in a differentapplication)

H(xk |xk−1, . . . , x1, Y ). (3)

The problem of estimating the rate required to transmit X Q

reduces itself to the problem of estimating entropy (3). Notethat the side information is in general not available at theencoder and so it has to be approximately estimated for bitrateallocation, as it will be explained in the experimental sec-tion.2

1 A refined model using a mixture of Laplacian has been proposed in[31].2 In one of the considered scenarios, all the frames (WZ frames and keyframes) are available at the encoder. However, even if the WZ encodercan access the key frames, we do not want to locally calculate the sideinformation, since this is a complex task, involving motion estimation,that can be done at the decoder, but not at the encoder.

Let us analyze in detail the procedure to evaluate theentropy (3). In order to compute entropy (3) via model (1),the encoder needs to know the value of the Laplacian param-eter α in (2) and an approximation Y of the side information(the details about the computation of Y are in Sect. 4). Theparameter α=1/E[|e|] can be estimated at the encoder, withlow complexity, as the average of the absolute value of thedifference error between X and the approximate side infor-mation Y .

In the following, if I is a set of real numbers and b ∈ R,

we use the notation I + b�= {a + b|a ∈ I }. Moreover, we

will use the notation

I nk

�= {u : Qk[u] = Qk[Xn]} (4)

that is, I nk is the set of numbers whose k-bit quantized ver-

sion is equal to the k-bit quantized version of the n-th subbandcoefficient Xn . In other words, I n

k is the quantization intervalidentified by the first k bits.

In order to compute (3), we estimate the expectation of− log2 p(xk |xk−1, . . . , x1, Y ) by taking the sample averageof − log2 p(xk |xk−1, . . . , x1, Y ), that is,

H(xk |xk−1, . . . , x1, Y )

= E[− log2 p(xk |xk−1, ..., x1, Y )]

� 1

S

S∑

n=1

− log2 p(xnk |xn

k−1, . . . , xn1 , Y n) (5)

where S is the number of coefficients in subband s.In order to compute (5), we need to know p(xn

k |xnk−1, . . . ,

xn1 , Y n) which can be obtained by exploiting model (1) as

follows:

p(xnk |xn

k−1, . . . , xn1 , Y n) = p(xn

k , xnk−1, . . . , xn

1 , Y n)

p(xnk−1, . . . , xn

1 , Y n)

= P[X ∈ I nk , Y = Y n]

P[X ∈ I nk−1, Y = Y n]

= P[Y + e ∈ I nk , Y = Y n]

P[Y + e ∈ I nk−1, Y = Y n]

= P[e ∈ I nk − Y n]

P[e ∈ I nk−1 − Y n] , (6)

where we exploited the independence between e and Y . Thelast term of (6) can be easily computed at the encoder byexploiting the Laplacian model (2), the knowledge of Xn ,and the knowledge of the approximated side information Y n .

2.2.2 Binary channel model

In another approach to the rate estimation problem, one cancompute the rate of the WZ code from the entropy H(p) ofthe bitplane crossover probability p = P[xk �= yk] [4,23,27]. Note that, if one assumes the binary symmetric channel

123

54 SIViP (2011) 5:49–60

model xk = yk + qk , where qk is independent of yk , P[qk =1] = p, and the sum is modulo 2, we have H(p) = H(xk |yk).This is consistent with (3), where dependence from WZ andside information bitplanes, other than the current bitplane, isneglected. Entropy H(xk |yk) or probability p can be com-puted from xk , which is known at the encoder, and yk , calcu-lated from an approximation Y of the side information.

2.2.3 Comparison

We performed extensive experiments to see if the consid-ered models are adequate to predict the bitrate required bypractical turbo or LDPC codes to perform Wyner–Ziv decod-ing at the receiver. In particular, we compared the bitrate esti-mate provided by the models with the actual rate requestedusing a feedback channel, as in the scheme described in theprevious subsection.

We report here, first, the results of experiments where wegenerate independent Laplacian random variables Y and e,with variances σ 2

Y = 1 and σ 2e = 0.04, respectively. We then

compute X = Y +e. This setting should be representative ofa situation where Y models the wavelet coefficient of the sideinformation, and e is the error between X and Y . We use a5-bit midtread quantizer, where quantization intervals coverthe range of X and are arranged as described in Sect. 2 (seeFig. 4). In particular, from the probability density function ofX , which is the convolution of two Laplacians, we computethe quantizer range [−V, V ] so that P[|X | > V ] = 10−4.Values of X outside [−V, V ] are clipped. Table 1 reportsthe values of H(p) and H(xk |yk), where p = P[xk �= yk],calculated from averages involving 105 realizations. We alsoreport in the table the value H(xk |xk−1, . . . , x1, Y ). As it canbe seen, we have H(p) � H(xk |yk). Note that we would have

Table 1 Comparison of entropies for Laplacian random variables

H(p) H(xk |yk) H(xk |xk−1, . . . , x1, Y )

bit 1 0.3776 0.3767 0.2635

bit 2 0.0197 0.0160 0.0072

bit 3 0.1544 0.1422 0.0787

bit 4 0.4533 0.4372 0.2748

bit 5 0.7940 0.7836 0.5515

Table 2 Probabilities for Laplacian random variables

P[yk = 0] P[xk = 0|yk = 1] P[xk = 1|yk = 0] P[xk �= yk ]

bit 1 0.6264 0.0834 0.0651 0.0719

bit 2 0.9912 0.0950 0.0010 0.0019

bit 3 0.9116 0.1043 0.0146 0.0226

bit 4 0.7551 0.1736 0.0681 0.0939

bit 5 0.6263 0.3072 0.1992 0.2396

exactly H(p) = H(xk |yk) in the case of a symmetric binarychannel xk = yk + qk , with p = P[qk = 1] = P[xk �= yk]and qk independent of yk . As evidenced by the calculatedprobabilities in Table 2, we see that the channel relating yk

and xk is, however, not symmetric.The results calculated on real data are in good agreement

with those reported above. The conclusion is that, even ifthe binary symmetric channel model may not be accurate(as pointed out, for instance, in [31]) we still have H(p) �H(xk |yk) for the proposed quantizer and setting. Moreover,both H(p) and entropy (3) provide a good estimate of therequired bitrate for lower-resolution subbands (in particu-lar, subbands 0–6 in Fig. 3a), with H(p) assuming a moreconservative larger value. For high-resolution subbands (sub-bands 7–9 in Fig. 3a), the models tend to underestimate therequired bitrate, thus leading to incorrect decoding. Insteadof increasing the bitrate estimate for these subbands, wepropose a hybrid procedure where the quantized high-res-olution subbands are entropy-coded using low-complexityintra-coding procedures [32]. For the lower resolution sub-band, we use H(p) as the estimate. As an example, we plotin Fig. 3b the required bits for each bitplane of all waveletsubbands for one frame of the QCIF sequence Teeny, quan-tized with a quantization step � = 32. The vertical linesand the index from 0 to 9 separate the bitplanes of differentsubbands. We plot H(p), entropy (3) and the bitrate actuallyrequested via the feedback channel.

3 DVC via modulo reduction

In this section we focus on a novel DVC scheme that is animproved version of the one presented in [27]. In particu-lar, we introduce here an Evaluation block (see Fig. 5) thatuses a simplified procedure to choose the modulo reductionparameter.

The scheme does not use turbo-codes and does not requirefeedback from the receiver. As seen in Fig. 5, it comprisesthree steps: (1) reduction modulo M of the unquantized orig-inal wavelet coefficient X to obtain the reduced variableX = φM (X)

�= X mod M (see Fig. 6); (2) lossy codingof X . The reduced coefficients can be compressed by meansof an efficient wavelet coder. In our implementation we usethe low-complexity coder presented in [32], but other choicesare possible; (3) at the receiver, maximum likelihood (ML)decoding of X from quantized X and side information Y , asexplained in Sect. 3.1. As in Sect. 2, the side informationY is generated by temporal interpolation based on Motion-Compensated (MC) interpolation with symmetric motionvectors [28]. We discuss below how to choose M to guar-antee the recovery of X , after detailing the reconstructionprocedure.

123

SIViP (2011) 5:49–60 55

Fig. 5 The proposed schemewith the additional Evaluationblock

WaveletTransform

ModuloReduction

UniformQuantizer

Coder

Lower-Tree Wavelet (LTW) Encoder

Lower-Tree Wavelet (LTW) Decoder

Transmission side

Channel

X

Wyner-Zivframes Evaluation

Block Y

M

Inverse Quantizer

ReconstructionInverseWavelet

Transform

Y

Receiver side

Sideinformation

WaveletTransform

DecoderChannel

X

DecodedWyner-Ziv

frames

Fig. 6 Construction of thereduced variable X , andexamples of the reconstructionrule at the receiver

Remark 1 It is worth giving the rationale behind this schemeby comparing it with a syndrome-based Wyner–Ziv scheme.In a WZ scheme, instead of transmitting each bitplane of X ,we transmit a syndrome which allows the receiver to deducethat the encoded binary word belongs to a coset; similarly,in the proposed scheme, from the knowledge of X one can

deduce that X belongs to the coset φ−1M (X) = X + MZ

�={X + nM; n ∈ Z} (see Fig. 6; the effect of quantization willbe discussed in detail in Sect. 3.1). The reduced value X canbe interpreted as an analog syndrome of X . At the receiver,ML reconstruction estimates X by choosing the element ofφ−1

M (X) that is closest to the side information Y . Disregardingquantization, it is clear that no error occurs if |X−Y | < M/2.

In the usual syndrome-based Wyner–Ziv paradigm, thenumber of bits of the syndrome must be large enough toallow for the correction of all the “flipped” bits in the bit-planes of X and of the side information. If the syndromelength is not sufficient, X is recovered with an error; simi-larly, in the proposed scheme, the value of M is chosen large

enough to grant for the reconstruction, and if the value ofM is underestimated, errors will occur. The major differ-ence between this scheme and a classical WZ scheme is thathaving an analog syndrome allows us to move the quantizerafter the syndrome computation and use any lossy schemeto encode the reduced values.

3.1 Decoding

In this section, we explain in detail how ML decoding isdone with the proposed scheme. According to the proceduredescribed above, the receiver, after lossy decoding, recon-structs a lossy version of the reduced value X . The goal ofthe decoder is to recover X from the lossy version of X andthe side information Y .

From the knowledge of the lossy version of X , assuminguniform quantization with step �, the receiver can deducethat the reduced value X belongs to a quantization intervalI ⊂ R that can be expressed as I = X Q + [−�/2,�/2],

123

56 SIViP (2011) 5:49–60

for some suitable X Q = X + η and quantization noise η.Since X = φM (X) ∈ I , we can deduce that X belongs to theinverse image of I , that is,

X ∈ φ−1M (I ) =

⋃

n∈Z

I + Mn =⋃

n∈Z

(X Q + Mn)

+[−�/2,�/2]. (7)

Figure 6 shows an example where the set X Q+[−�/2,�/2]is shown with a thick line on the vertical axis and the corre-sponding inverse image (7) is shown with thick lines on thehorizontal axis.

The ML reconstruction X of X is the element of (7) closestto Y . Element X can be efficiently found as follows. First wefind the n ∈ Z that minimizes the distance between X Q +Mnand Y by computing

n�= arg min

n∈Z

|Y − (X Q + Mn)| = �(Y − X Q)/M� (8)

where �x� denotes the integer value closest to x . Let ε = Y −(X Q + Mn) be the corresponding error. Note that X Q + Mncan be considered an approximate version of the ML recon-struction of X since it belongs to set (7), and it is the elementof X Q + MZ that is closest to Y . In order to obtain the actualML reconstruction X of X it suffices to correct X Q + Mn byadding to it the error ε suitably “saturated” in order to remainin set (7). More precisely,

X = X Q + Mn + g�(ε) = X + Mn + g�(ε) + η (9)

where g�(ε) is equal to ε if |ε|<�/2 and saturates to ±�/2otherwise (that is, g�(ε) = sgn(ε)�/2 if |ε| > �/2, wheresgn is the sign function). By exploiting the fact that |ε| <

M/2, it is easy to check that whenever � < M , X in (9) isthe element of (7) closest to Y . In Fig. 6, it is possible to seethe reconstruction procedure for the two cases Y1 ∈ φ−1

M (I )(that corresponds to |ε| < �/2) and Y2 �∈ φ−1

M (I ) (that cor-responds to |ε| > �/2). Finally, note that if � is small, thereconstruction procedure (9) can be simplified without mucherror by removing the term g�(ε).

3.2 Reconstruction error and modulo reduction parameterevaluation

In this section we compute the power of the distortion thatresults from the above ML reconstruction procedure. Let N ∈Z be the integer such that X = X + N M and observe thatthe reconstruction error X − X can be written as

X − X = (X + M N ) − (X + Mn + η + g�(ε))

= M(N − n) − (η + g�(ε)) (10)

If we write, as before, e = Y − X , we can observe that

n = �(Y − X Q)/M�= �(Y − X + N M − η)/M�= �(e − η + N M)/M�= �(e − η)/M� + N . (11)

By using (11) in (10) one obtains

X − X =M�(η − e)/M�︸︷︷︸overload

− (η + g�(ε))︸︷︷︸coding

. (12)

According to (12), the reconstruction error is the sum of twocomponents: the coding error η + g�(ε) due to the lossycoding of X and the term M�(η − e)/M� that appears whenη − e does not satisfy the hypothesis |η − e| < M/2. Thelatter contribution can be seen as an overload error and canassume large values.

In order to compute the power D of the reconstructionerror (12) we are going to make some approximations: (1)we will assume that the quantization noise and the overloaderror are uncorrelated, (2) we will assume g�(ε) = 0 (thiscorresponds to use in (9) the approximate ML reconstructionX = X Q + Mn), and (3) we will suppose η negligible withrespect to e. Note that these approximations are acceptablewhen � is small. We obtain

D = E[(X − X)2] = E[η2] + M2 · E[�e/M�2]. (13)

For small �, we can approximate the first term as �2/12.The other term, instead, can be calculated, in the hypothesisof e Laplacian with variance σ 2 = 2/α2, as

E[�e/M�2] =+∞∑

k=−∞k2

k M+M/2∫

k M−M/2

α

2exp(−α |a|) da

= 2 β exp(−α M) (exp(−α M) + 1)

(1 − exp(−α M))3 , (14)

where β = ( 12 exp(α M/2) − 1

2 exp(−α M/2)). Note that

(14) depends on α and M only through their product αM . Insummary, the average quadratic distortion, in the commonhypothesis of small �, can be approximated as

D ��2/12+M2 · 2β exp(−α M) (exp(−α M) + 1)

(1 − exp(−α M))3 . (15)

The term due to the overload error is introduced by the mod-ulo reduction function, caused by the large error one makeswhen |X − Y | is large. It is apparent from (15) that the over-load error contribution can be very large, unless the proba-bility of incorrect decoding is negligible.

In the scheme described above, an Evaluation block isneeded at the transmitter to find a value of the modulo param-eter M used by the Modulo Reduction block.

123

SIViP (2011) 5:49–60 57

In order to make the overload error contribution negligible,we observe that, by taking quantization into account, we haveexact reconstruction whenever |X −Y | < (M −�)/2. In theEvaluation block, we set therefore M = (1 + δ)(2 max |X −Y |+�), where Y is an approximation of the side informationat the encoder (computed as described in Sect. 4) and δ is asmall positive value that plays the role of “safety parame-ter.” The value of δ is not critical. We used δ = 0.1 in ourexperiments.

4 Experimental results

In the following, we will evaluate the performance of the dis-tributed coding schemes introduced in the previous sections,and compare them with more conventional coding proce-dures.

The key frames are encoded as I-frames using a stan-dard H.264/AVC coder. The turbo code is a Rate Compat-ible Turbo Punctured (RCPT) code with a puncturing periodequal to 33 [11]. The experiments with LDPC codes use thescheme of [22]. The Wavelet transform is computed by usingthe well-known 9/7 biorthogonal Daubechies filters, using athree-level pyramid.

In summary, we compare the performance (in scenarios 1and 2—see Fig. 2) of the following schemes:

• Wavelet Domain WZ scheme with feedback from thereceiver (this scheme will be referred to as WD WZ inthe figures).

• Hybrid Wavelet-Domain WZ scheme with rate estima-tion (WD WZ RE). For this scheme we use H(p) toestimate the required bitrate for the lower-resolutionsubbands. This entropy is calculated assuming theLaplacian distribution for the difference e j = X j − Y j

(X j is the wavelet coefficient of subband j in the WZframe and Y j is the corresponding wavelet coefficientin the side information frame), with α estimated at theencoder as the difference between X j and an approxi-mation Y j of the side information. The high-resolutionsubbands are instead intra coded.

• Modulo reduction scheme. We consider two variationsof this scheme: in the first one (MR in the figures), theparameter M is computed as explained in Sect. 3; in thesecond one, we set M large enough so that no actualreduction is performed. This is equivalent to intra codingof the WZ frames with ML joint decoding using the sideinformation (the scheme will be referred to as MLJD inthe figures).

• Inter coding (H.264 Inter in the figures). The WZ framesare encoded as B frames (predicted from the previous andnext frame with motion compensation) using the H.264coder.

• Intra coding, no joint decoding (IC in the figures). Thisis the simplest scheme, which does not make use of thedistributed coding paradigm.

In scenario 2, we also consider the following scheme:

• Intra coding, with the scheme described in [32], of thedifference X − XAV, where XAV is the average of thekey frames closest to the current frame (IC X − XAV).This procedure implements a rather conventional coderthat operates on frame-differences, but does not performmotion compensation.

As explained in Sect. 2.2 and Sect. 3, rate and parameter esti-mation at the transmitter is based on the construction of anapproximation Y of the side information. In particular, in thefirst scenario, the key frames are not available at the encoder.Therefore, we compute the average of the WZ frames closestto the current frame, and approximate the side informationas the wavelet coefficients Y of this average. In the secondscenario, Y is the wavelet coefficient of the average of thekey frames closest to the current frame. The two scenariosdiffer only for the side information Y which is constructed atthe transmitter for rate estimation or modulo reduction. Theperformance loss in the case of scenario 1 is negligible forall sequences (we report one example below).

We consider 299 frames of QCIF Foreman and Carphonesequences, and 73 frames of the QCIF Teeny sequence, codedat 30 frames/s. Only the performance relative to the lumi-nance component of the WZ frames (i.e., the even frames)is considered. The key frames (i.e., odd frames) are com-pressed at the encoder with the H.264/AVC standard coder.In the first set of simulations, we set a quantization parameterQP in order to have an average PSNR, for the key frames, ofabout 33 dB.

Figures 7a and 8 show the rate–PSNR performance forForeman and Carphone sequences (scenario 2). Figure 7bshows the performance for Foreman in scenario 1, where thekey frames are not available at the encoder. For the hybridWZ scheme using channel codes and rate estimation at theencoder, the CRC allows to recognize if a bitplane is notcorrectly reconstructed. In this case, decoding is based onthe correctly received bitplanes only. As we can see, for sce-nario 2, intra coding of the difference X − XAV with jointdecoding performs much better than the other schemes. Asmentioned, the intra coder can be implemented in this casewith low complexity [32], with a clear performance advan-tage with respect to the DVC schemes considered in thispaper and in related papers in the literature. However, notethat this scheme cannot be used in scenario 1. Among theother schemes, the WZ Wavelet Domain scheme with feed-back from the receiver has the best performance at some bit-rates, while we notice some performance loss when the rate isestimated at the encoder. The modulo reduction scheme hascomparable or better performance, with a slight advantage

123

58 SIViP (2011) 5:49–60

Fig. 7 a Rate–PSNRperformance for the Foremansequence (scenario 2), the keyframes are compressed using aQP = 35. b Rate–PSNRperformance for the Foremansequence (scenario 1), the keyframes are not available at theencoder and they arecompressed using a QP = 35

0 100 200 300 400 500 600 700 80025

30

35

40

45

50

Rate (kbit/s)

PS

NR

(dB

)

Foreman sequence

WD WZWD WZ REMRMLJDICIC X−X

AV

H.264 Inter

0 100 200 300 400 500 600 700 80025

30

35

40

45

Rate (kbit/s)

PS

NR

(dB

)

Foreman, no key frames at the encoder

WD WZWD WZ REMRMLJDIC

(a) (b)

0 100 200 300 400 500 600 700 80025

30

35

40

45

50

Rate (kbit/s)

PS

NR

(dB

)

Carphone sequence


AV

H.264 Inter

300 400 500 600 700 80035

36

37

38

39

40

41

42

43

Rate (kbit/s)

PS

NR

(dB

)

Carphone sequence

MR

MLJD

(a) (b)

Fig. 8 a Rate–PSNR performance for Carphone (scenario 2), the key frames are compressed using a QP = 35. b Rate–PSNR performance com-parison between the MR and MLJD schemes (enlargement)

(around 0.3 dB) over the MLJD scheme, for which themodulo reduction module is not activated. Figure 8b showsan enlargement of the MR and MLJD curves for the Car-phone sequence.

Similar results are obtained with the Teeny sequence (seeFig. 9), a very high motion sequence. In this case, the keyframes are coded using a lower QP = 5 value. Due to thelower quality of the side information that can be reconstructedat the receiver because of motion, the schemes based on chan-nel codes perform poorly, with a clear advantage for MLJDand the modulo reduction procedures which, in this case,have very similar performance. Note that pure intra codingis also competitive at moderate bit-rates.

To see the influence of the key frames at the receivers,from which the side information is constructed, Fig. 10 showsthe performance of the various schemes for the Foremansequence when the key frames, with respect to Fig. 7, arecoded with a better quality (QP = 5).

As expected, the performance of the presented DVCschemes is well below that of H.264 interframe coding forall the considered sequences, confirming a gap between ahigh complexity, motion-compensated video coder, and DVCschemes.

We also compare the proposed Modulo Reduction (MR)scheme with the DISCOVER codec [16]. Figure 11 shows thebitrate and PSNR relative to the luminance component of theWZ frames for the Foreman sequence (the bit rate range inthe figure considers the working points of the publicly avail-able binaries of DISCOVER). As in [16], for both schemeswe choose the rate distortion point so that the average qual-ity (PSNR) of the WZ frames is similar to the quality of thekey frames. Note that DISCOVER uses a feedback channelto request more parity bits until the decoder reconstructionis successful. As we can see, and as confirmed by experi-ments with other sequences, DISCOVER exhibits a PSNRgain which is consistent with the performance of the schemewith feedback considered here (WD WZ in Figs. 7 and 10).The additional gain of DISCOVER is due to the fact that ituses an improved algorithm for the construction of the side-information at the decoder, a problem that we did not considerin this paper. To this respect, note that any amelioration in thedecoder can be easily incorporated in the schemes describedin this paper.

Finally, Fig. 12 compares the performance of the WZscheme with feedback, when turbo codes (this plot is thesame as in Fig. 7a) and LDPC codes are used. As we can

123

SIViP (2011) 5:49–60 59

0 100 200 300 400 500 600 700 80020

25

30

35

40

45

50

Rate (kbit/s)

PS

NR

(dB

)Teeny sequence


AV

H.264 Inter

Fig. 9 Rate–PSNR performance for the Teeny sequence (scenario 2),the key frames are compressed using a QP = 5

0 100 200 300 400 500 600 700 80025

30

35

40

45

50

Rate (kbit/s)

PS

NR

(dB

)

Foreman sequence (QPH.264

= 5)


AV

H.264 Inter

Fig. 10 Rate–PSNR performance for the Foreman sequence (scenario2), the key frames are compressed using a QP = 5

see, the two solutions have comparable performance, with aslight advantage in favor of LDPC codes.

5 Conclusions

In this paper, we proposed and compared several schemesbased on the recently proposed paradigm of distributed videocoding. DVC is an interesting option for emerging applica-tions where geographically separated sources capture cor-related video. Moreover, since the computational burden istransferred to the decoder, DVC can be applied in scenarioswhere the coders can be as simple as possible. Another prac-tical advantage is that the DVC paradigm could be morerobust against transmission errors, since the decoder exploits

20 40 60 80 100 120 140 160 18025

30

35

40

45Foreman sequence

Rate (kbit/s)

PS

NR

(dB

)

DISCOVER

MR

Fig. 11 Performance comparison between the Modulo Reduction(MR) scheme and DISCOVER (Foreman)

0 100 200 300 400 500 600 700 80025

30

35

40

45

50

Rate (kbit/s)

PS

NR

(dB

)

Foreman sequence

WD WZ Turbo Codes

WD WZ LDPC Codes

Fig. 12 Rate–PSNR performance for the WZ scheme based on turboand LDPC codes

statistical rather than deterministic correlation. We proposeda Wavelet Domain scheme with a rate estimation procedureto avoid feedback from the receiver. Moreover, we describeda scheme, based on modulo reduction, which does not requireany feedback and does not make use of channel codes.Besidespresenting several advantages, the proposed schemes havecomparable or better performance than state-of-the-art DVCcoders. Experiments show, however, that more conventionaltechniques (e.g., intra coding with no joint decoding andintra coding of the difference between our frame and theone obtained by averaging the closest key frames) whichdo not or partially use the distributed coding paradigm, canhave comparable or better performance than the consideredDVC schemes, at least for some sequences and bit-rates. Inaddition, an H.264 interframe coding has significantly better

123

60 SIViP (2011) 5:49–60

performance than the considered DVC schemes. However,DVC can have a role in some applications, especially whena good quality side information can be constructed at thedecoder.

References

1. Slepian, D., Wolf, J.K.: Noiseless coding of correlated informationsources. IEEE Trans. Inf. Theory IT-19(4), 471–480 (1973)

2. Wyner, A., Ziv, J.: The rate-distortion function for source cod-ing with side information at the decoder. IEEE Trans. Inf. TheoryIT-22(1), 1–10 (1976)

3. Girod, B., Aaron, A., Rane, S., Rebollo-Monedero, D.: Distributedvideo coding. In: Proceedings of IEEE (Special Issue on Advancesin Video Coding and Delivery), vol. 93(1), pp. 71–83 (2005)

4. Guillemot, C., Pereira, F., Torres, L., Ebrahimi, T., Leonardi, R.,Ostermann, J.: Distributed monoview and multiview video cod-ing. Signal Process. Mag. IEEE 24(5), 67–76 (2007)

5. Garcia-Frias, J.: Compression of correlated binary sources usingturbo codes. IEEE Commun. Lett. 5(10), 417–419 (2001)

6. Bajcsy, J., Mitran, P.: Coding for the Wyner-Ziv problem withturbo-like codes. In: Proceedings of ISIT ’02, p. 91. Lausanne,Switzerland, July 2002

7. Aaron, A., Girod, B.: Compression with side information usingturbo codes. In: Proceedings of DCC ’02, pp. 252–261. Snowbird,UT, April 2002

8. Pradhan, S.S., Ramchandran, K.: Distributed source coding usingsyndromes (DISCUS): design and construction. In: Proceedings ofDCC ’99, pp. 158–167. Snowbird, UT, March 1999

9. Pury, R., Ramchandran, K.: PRISM: a new robust video codingarchitecture based on distributed compression principles. In: Pro-ceedings of Allerton Conference on Communication, Control, andComputing, Urbana-Champaign, IL, October 2002

10. Aaron, A., Zhang, R., Girod, B.: Wyner-Ziv coding of motionvideo. In: Proceedings of Asilomar Conference on Signals andSystems, Pacific Grove, California, November 2002

11. Aaron, A., Rane, S., Setton, E., Girod, B.: Transform-domainWyner-Ziv codec for video. In: Visual Communications and ImageProcessing, VCIP-2004, San Jose, CA, January 2004

12. Tagliasacchi, M., Tubaro, S.: A MCTF video coding scheme basedon distributed source coding principles. Visual Communication andImage Processing, September 2005

13. Ascenso, J., Brites, C., Pereira, F.: Motion compensated refinementfor low complexity pixel based distributed video coding. In: IEEEInternational Conference on Advanced Video and Signal BasedSurveillance, Como, Italy, July 2005

14. Ascenso, J., Brites, C., Pereira, F.: Improving frame interpolationwith spatial motion smoothing for pixel domain distributed videocoding. In: 5th EURASIP Conference on Speech and Image Pro-cessing, July 2005

15. Ascenso, J., Brites, C., Pereira, F.: Content adaptive Wyner-Zivvideo coding driven by motion activity. In: IEEE International Con-ference on Image Processing, Atlanta, USA, October 2006

16. Artigas, X., Ascenso, J., Dalai, M., Klomp, S., Kubasov, D., Ouaret,M.: The discover Codec: architecture, techniques and evaluation.In: Picture Coding Symposium (PCS), Lisboa, Portugal, November2007

17. Adikari, A., Fernando, W., Arachchi, H.K., Weerakkody, W.:Wyner-Ziv coding with temporal and spatial correlations for motionvideo. In: IEEE Electrical and Computer Engineering, CanadianConference, Ottowa, May 2006

18. Tagliasacchi, M., Trapanese, A., Tubaro, S., Ascenso, J., Brites,C., Pereira, F.: Exploitation spatial redundancy in pixel domainWyner-Ziv video coding. In: IEEE International Conference onImage Processing, Atlanta, GA, October 2006

19. Wang, A., Zhao, Y., Wei, L.: Wavelet-Domain distributed videocoding with motion-compensated refinement. In: Proceedings ofICIP 2006, pp. 241–244. Atlanta USA, October 2006

20. Guo, X., Lu, Y., Wu, F., Gao, W.: Distributed video coding usingwavelet. In: Proceedings of ISCAS 2006, pp. 5427–5430. Island ofKos, Greece, May 2006

21. Liu, W., Dong, L., Zeng, W.: Estimating side-information forWyner-Ziv video coding using resolution-progressive decoding andextensive motion exploration. In: Proceedings of IEEE Interna-tional Conference on Acoustics, Speech, and Signal Processing(ICASSP), pp. 721–724. April 2009

22. Varodayan, D., Aaron, A., Girod, B.: Rate-adaptive codes for dis-tributed source coding. Eurasip Signal Process. J. 86(11), 3123–3130 (2006)

23. Kubasov, D., Lajnef, K., Guillemot, C.: A hybrid encoder/decoderrate control for Wyner-Ziv video coding with a feedback chan-nel. In: Proceedings of MMSP, IEEE International Workshop onMultimedia Signal Processing, Chania, Crete, Greece, pp. 251–254. October 2007

24. Bernardini, R., Rinaldo, R., Zontone, P., Vitali, A.: Performanceevaluation of distributed video coding schemes. In: ICASSP Inter-national Conference on Acoustics, Speech, and Signal Processing,Las Vegas, Nevada, April 2008

25. Magli, E., Barni, M., Abrardo, A., Grangetto, M.: Distributedsource coding techniques for lossless compression of hyperspec-tral images. EURASIP J. Adv. Signal Process. Article ID 45493,vol. 2007, 13 pp (2007)

26. Tagliasacchi, M., Trapanese, A., Tubaro, S., Ascenso, J., Brites,C., Pereira, F.: Intra mode decision based on spatio-temporal cuesin pixel domain Wyner-Ziv video coding. In: IEEE InternationalConference on Acoustic, Speech and Signal Processing, Toulouse,May 2006

27. Bernardini, R., Rinaldo, R., Zontone, P., Vitali, A.: Wavelet domaindistributed coding for video. In: IEEE International Conference onImage Processing, Atlanta, GA, October 2006

28. Alfonso, D., Bagni, D., Moglia, D.: Bi-directionally motion-compensated frame-rate up-conversion for H.264/AVC decoder.In: ELMAR Symposium, Zadar, Croatia, June 2005

29. Tagliasacchi, M., Pedro, J., Pereira, F., Tubaro, S.: An efficientstopping method at the turbo decoder in distributed source coder.In: EUSIPCO 2007, Poznan, Poland, September 2007

30. Bernardini, R., Naccari, M., Rinaldo, R., Tagliasacchi, M., Tubaro,S., Zontone, P.: Rate allocation for robust video streaming based ondistributed video coding. Signal Process. Image Commun. 23, 391–403 (2008)

31. Wang, H., Cheung, N.-M., Ortega, A.: A framework for adap-tive scalable video coding using Wyner-Ziv techniques. Eurasip J.Appl. Signal Process. Article ID 60971, vol. 2006, 18 pp (2006)

32. Oliver, J., Malumbres, M.P.: Low-complexity multiresolutionimage compression using wavelet lower trees. IEEE Trans. Cir-cuits Syst. Video Technol. 16, 1437–1444 (2006)

123

performance evaluation of wavelet-based distributed video coding schemes

Documents