side-information-dependent correlation channel estimation in

1934 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012

Side-Information-Dependent Correlation ChannelEstimation in Hash-Based Distributed Video Coding

Nikos Deligiannis, Member, IEEE, Joeri Barbarien, Marc Jacobs, Adrian Munteanu, Member, IEEE,Athanassios Skodras, Senior Member, IEEE, and Peter Schelkens, Member, IEEE

Abstract—In the context of low-cost video encoding, distributedvideo coding (DVC) has recently emerged as a potential candidatefor uplink-oriented applications. This paper builds on a conceptof correlation channel (CC) modeling, which expresses the corre-lation noise as being statistically dependent on the side informa-tion (SI). Compared with classical side-information-independent(SII) noise modeling adopted in current DVC solutions, it is theo-retically proven that side-information-dependent (SID) modelingimproves the Wyner–Ziv coding performance. Anchored in thisfinding, this paper proposes a novel algorithm for online estima-tion of the SID CC parameters based on already decoded informa-tion. The proposed algorithm enables bit-plane-by-bit-plane suc-cessive refinement of the channel estimation leading to progres-sively improved accuracy. Additionally, the proposed algorithm isincluded in a novel DVC architecture that employs a competitivehash-based motion estimation technique to generate high-qualitySI at the decoder. Experimental results corroborate our theoret-ical gains and validate the accuracy of the channel estimation al-gorithm. The performance assessment of the proposed architec-ture shows remarkable and consistent coding gains over a germanegroup of state-of-the-art distributed and standard video codecs,even under strenuous conditions, i.e., large groups of pictures andhighly irregular motion content.

Index Terms—Correlation channel (CC), distributed videocoding (DVC), hash information, online successively refinedchannel estimation, overlapped block motion estimation (OBME).

I. INTRODUCTION

U PLINK-ORIENTED power-constrained applications,e.g., wireless multimedia sensors, require the design of

novel video coding architectures, providing low-cost encoding,robustness against transmission errors, and high compression

Manuscript received June 01, 2011; revised October 05, 2011 and November14, 2011; accepted December 01, 2011. Date of publication December 22,2011; date of current version March 21, 2012. This work was supported by theFWO Flanders under Project G.0391.07 and Project G.0146.10. The work ofP. Schelkens was supported under a FWO postdoctoral fellowship grant. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Prof. Pascal Frossard.N. Deligiannis, M. Jacobs, A. Munteanu, and P. Schelkens are with the

Department of Electronics and Informatics, Vrije Universiteit Brussel, B-1050Brussels, Belgium, and also with the Interdisciplinary Institute for Broad-band Technology, B-9050 Ghent, Belgium (e-mail: [email protected];[email protected]; [email protected]; [email protected]).J. Barbarien was with the Department of Electronics and Informatics, Vrije

Universiteit Brussel, B-1050 Brussels, Belgium, and also with the Interdiscipli-nary Institute for Broadband Technology, B-9050 Ghent, Belgium. He is nowwith Technicolor (e-mail: [email protected]).A. Skodras is with the Department of Computer Science, Hellenic Open Uni-

versity, GR-26222 Patras, Greece (e-mail: [email protected]).Digital Object Identifier 10.1109/TIP.2011.2181400

efficiency. Potential solutions to satisfy these requirements arerooted in distributed source coding (DSC) principles, laid in[1] and [2]. Triggered by DSC theory [1], [2], distributed videocoding (DVC) became a new coding paradigm supporting theaforementioned demands in video coding. By exploiting theinherent correlation present in the video sequence at the de-coder, DVC systems [3]–[6] allow for a complexity shift fromthe encoder to the decoder, being in contrast to conventionalvideo coding [7]. Since good DSC codes are based on channelcoding [4], [8], [9], distributed joint-source channel codingof video provides robustness against transmission errors [4],[10]. In addition, layered Wyner–Ziv (WZ) coding [11] enablesscalable video coding [12], [13]. Furthermore, the DSC theoryfacilitates efficient multiview video coding without requiringintercamera communication [14], [15]. Lately, DSC principlesfound application in flexible video decoding [16] and flexibledistribution of complexity [17].In the context of DVC, Girod et al. introduced a feedback

channel-based WZ architecture [4]. At the outset, motion-com-pensated interpolation (MCI) or extrapolation were employed togenerate side information (SI) at the decoder [18]. An improvedMCI technique was later integrated in the DISCOVER codec[5], providing state-of-the-art performance. However, MCI failsto capture fast and irregular motion mainly due to blind mo-tion estimation [19], [20]. To overcome this problem, effectiveSI creation approaches are based on successive refinement oron hash-based motion estimation. In successively refined mo-tion estimation, the decoder upgrades the quality of the SI whenadditional information is decoded [21]. Other schemes proposethe transmission of auxiliary (hash) information to the decoderin order to augment SI creation [22]–[25]. In this context, wehave proposed overlapped block motion estimation and proba-bilistic compensation (OBMEPC) [26], which enables accuratecapturing of motion using a coarse version of the original frame.Although, in DSC theory, the codec is assumed to have per-

fect knowledge of the correlation statistics, in a practical DVCsystem, the correlation channel (CC) noise can never be directlymeasured. Hence, accurate CCmodeling is needed. Existing ap-proaches construct an additive CC model in which the noiseis assumed to be independent of the channel input signal. Inearly works, a zero-mean Laplacian noise model is employed, ofwhich the scaling parameter is assumed temporally and spatiallystationary [4]. Then, Westerlaken et al. [27] argued that, by dif-ferentiating the noise scaling parameter for occluded and nonoc-cluded areas, the overall coding performance is improved. In[28], however, they showed that segmentation inaccuracies no-tably reduce the convenience of the nonstationary model. In the

1057-7149/$26.00 © 2011 IEEE

DELIGIANNIS et al.: SID CCE IN HASH-BASED DVC 1935

state-of-the-artMCI-based DVC architecture, Brites and Pereira[29] proposed a spatially stationary Laplacian model and per-formed online estimation of its scaling parameter perWZ frame/discrete cosine transform (DCT) band at the decoder side. Sincethe quality of the SI fluctuates spatially, Brites and Pereira [29]also proposed an estimation based on block and pixel/DCT coef-ficients. However, along the lines of [28], adaptation in smallerspatial regions does not necessarily lead to improved perfor-mance since online estimation becomes imprecise due to limitedstatistical support information. To improve the estimation, pro-gressive refinement of the noise variance upon decoding of eachDCT band is proposed in [30] and [31]. Alternatively, Esmailiand Cosman [32] performed block-based classification of theLaplacian scaling parameter using profiles obtained by offlinetraining. Recently, a CC estimation (CCE) technique, which ex-ploits the inter-bit-plane correlation using particle filtering, hasbeen proposed by Stankovic et al. [33] in a pixel-domain archi-tecture without a feedback channel.In contrast to existing models, we have introduced an ad-

ditive CC model in which the noise depends on the channelinput signal. Since the SI signal is the input of the consideredchannel, the term side-information-dependent (SID) correlationnoise model is used [34]–[36]. In particular, Deligiannis etal. [34], [35] introduced the concept of SID modeling in thepixel domain and validated it experimentally using a fittingerror metric (i.e., conditional relative entropy), i.e., minimizedoffline. Optimal SI, by means of a motion oracle, was assumedin [34], whereas in [35], SI was generated using OBMEPC[26]. The improvement in fitting accuracy, brought by SIDmodeling, was first shown to imply rate savings in [36].This paper builds on the concepts introduced in our prior

works [26], [34]–[36] and brings four major contributions. First,this paper thoroughly quantifies the theoretical coding gain thatthe SID model brings over classical side-information-indepen-dent (SII) modeling. Second, conversely to [34]–[36], this paperpresents SID modeling in the transform domain and proposes anovel algorithm for online SID estimation. It is important to notethat, contrary to other methods, i.e., [29]–[32], the proposed al-gorithm enables bit-plane-by-bit-plane refinement of CCE pro-viding improved accuracy. Third, the proposed online SID al-gorithm is included in a novel efficient hash-based DVC archi-tecture. Compared with our previous work [26] and other solu-tions in the literature [22]–[25], the presented architecture com-prises a novel approach to form and to compress a hash perWZ frame, which imposes a negligible complexity and memoryoverhead at the encoder. In addition, in contrast to the systemof [26], the presented architecture codes each WZ frame usinga transform-domain WZ (TDWZ) coding scheme. Fourth, atthe decoder, high-quality SI is produced by a new OBME andcompensation with subsampled matching (OBMEC/SSM) tech-nique. OBMEC/SSM notably advances over our OBMEPC in[26], by employing a different matching strategy and predic-tors that improve the prediction quality and reduce the compu-tational complexity.Experimental results corroborate the theoretical rate–distor-

tion (RD) gains brought by SID modeling. The experimentsalso show that the algorithm for online SID estimation yields

improved performance compared with state-of-the-art tech-niques [29], [31]. Moreover, the assessment of the proposedhash coding method and the new OBMEC/SSM techniqueshows notable gains compared with the approaches of [26].Additionally, the proposed system is shown to bring sig-nificant compression improvements over a broad suite ofstate-of-the-art codecs targeting low-cost video encoding,including DISCOVER [5], H.264/AVC Intra and No Motion[7], the hash-based codec of [23], and our previous codec [26].The rest of this paper is structured as follows. The ratio-

nale of the SID model and its theoretical gain against SII mod-eling are given in Section II. The proposed hash-based DVCsystem is discussed in Section III, whereas the hash codec andthe OBMEC/SSMmethod are detailed in Sections IV and V, re-spectively. In Section VI, our novel algorithm to realize onlinesuccessively refined SID correlation estimation is presented.The experimental validation of the proposed techniques is givenin Section VII, whereas Section VIII concludes this paper.

II. SID CC MODELING

A. CC Modeling in DSC

In DSC, the correlation is often expressed by an additive noisechannel model, , where the SI is the input, andthe source is the output of the channel (see [37]). Hence,the relation between the conditional probability density function(pdf) of the channel output and the conditional pdf of the noise,given the input, is expressed as [38]

(1)

Assuming that the correlation noise is independent of thechannel input signal [4], [11], i.e., is independent of in theconsidered channel model [37], the CC model is simplified to

(2)

We observe that most DVC modeling approaches, e.g.,[4], [27]–[32], assume the correlation noise as indepen-dent zero-mean Laplacian with standard deviation , i.e.,

, as follows:

(3)

Therefore, the pdf of the source , given the SI , is ex-pressed by a Laplacian distribution centered on having stan-dard deviation , i.e., .The independent noise component of (3) has been considered

stationary at different levels. In pixel-domain systems, the noiseparameter is estimated at sequence level [4], [29], frame level

[29], block level [29], or pixel level [29]. Analogously, in trans-form-domain systems, the noise parameter is estimated perband of the sequence [22], [29], per band of each WZ frame(band level) [29], [39], or per DCT coefficient (coefficient level)[29]–[31]. Since, in the referred cases, the noise is indepen-dent of the channel input, i.e., the SI (see Fig. 1(a) for theschema of the channel), we call such modeling approaches asSII noise modeling [34]–[36].


Fig. 1. Schema of (a) SII (symmetric) noise model [11] and (b) SID (asym-metric) noise model. The dashed line expresses the noise dependence on the SI.

Contrary to the SII approach, we have proposed a differentCC modeling concept [34], [35], in which the distribution of thenoise depends on the channel input signal, i.e., the SI. Specifi-cally, our SID [34], [35] channel model [see Fig. 1(b)] assumesthe noise as being zero-mean Laplacian with the standard devi-ation , which varies depending on the realization of theSI (input of the channel), i.e., , as follows:

(4)

Since the correlation noise is additive, (4) implies that, forevery realization of the SI alphabet , the pdf of the channeloutput is given by a Laplacian distribution centered on ,having a standard deviation , which varies with , i.e.,

.The validation of our SID channel model in the pixel-domain

has been given in [34], [35]. The SID model accurately capturesthe empirical conditional probability mass function (PMF), i.e.,

, calculated based on the sample values ofa WZ frame and its SI at a given spatial stationarity level.In effect, it has been shown in [34], [35] that, compared withthe SII model, the SID model brings a vast reduction of thefitting mismatch for a large set of video sequences. The reportedimprovements are consistent irrespective of the quality of theSI. In particular, an optimal motion oracle and OBMEPC [26]has been employed in [34] and [35], respectively. Furthermore,the SID is more accurate than the SII model for various levels ofassumed noise stationarity, including frame level [34] and blocklevel [35].Let the SI values stem from a discrete alphabet with

elements. Then, the projections of the SII and SID CC pdfsonto the -plane are given in Fig. 2(a) and (b), respec-tively. For an assumed noise stationarity level, in the SII (orinput independent) model, the noise variance is constant and in-dependent of the SI [see Fig. 2(a)]. In contrast, in the SID (orinput dependent) model, the noise variance depends on the re-alization of the SI [see Fig. 2(b)]. Following the terminology ofchannel symmetry in [40], the SII model can be interpreted as a-ary-input–continuous-output symmetric Laplacian channel.

Conversely, the SID model is equivalent to a -ary-input– con-tinuous-output asymmetric Laplacian channel. Notice also that,in case both and have a binary alphabet, the SII and SIDchannel models come down to the binary symmetric channeland the binary asymmetric channel models, correspondingly.In this paper, we analytically prove that, at the same noise sta-

tionarity level, SID channel modeling yields compression gainsover classical SII modeling. Driven by this finding, we proposea novel transform-domain online SID estimation method, whichconsiders band-level SID noise stationarity, i.e., it is applied per

Fig. 2. Projection of the (a) SII and (b) SID CC distribution models for anassumed noise stationarity level.

coded DCT band of each WZ frame. The proposed algorithmenables bit-plane-by-bit-plane successively refined SID corre-lation estimation, yielding significant gains over the offline SIIband-level method of [29], and the online SII coefficient-levelTRACE technique [31].

B. Theoretical Analysis of SID CC Modeling

We consider a WZ scheme consisting of uniform scalarquantization followed by ideal Slepian–Wolf (SW) coding[41]. This scheme has been shown to deliver WZ coding per-formance equivalent to entropy-coded scalar quantization innondistributed coding [41]. To simplify the calculations, knownasymptotic results [41], [42] are employed when necessary.The following holds.Lemma 1: The L-2 distortion for a Laplacian source quan-

tized using a uniform scalar quantizer centered on its mean isgiven by

(5)

where is the cell size of the quantizer and is thestandard deviation.

Proof: The proof is sketched in the Appendix.Remark 1: According to high rate results for distributed

quantization [41], the optimal SW-coded scalar quantizer forsmooth pdfs is the uniform quantizer. Based on lemma 1, thefollowing is derived.Lemma 2: Let , be the L-2 distortion of the SID

and SII Laplacian models, respectively. A necessary conditionso that the SID distortion be equal to the SII distortion for any

, i.e.,

, is given by

(6)

where and are the standard deviations of the SIDand SII models, respectively, and is the pdf of the SI .

Proof: The proof is given in the Appendix.Remark 2: We note that the employed uniform scalar quan-

tizer is centered on the mean of each Laplacian distribution inorder to achieve the upper bound in the WZ source coding gain[41]. Based on these two lemmas, the following holds.


Fig. 3. Block diagram of the proposed hash-based DVC scheme. The novel modules of the codec are highlighted in gray.

Theorem 1: Under high rate assumptions and considering theSID and SII models’ distortions equal

(7)

where and are rates for a distortion level, asgiven by an SID and SII Laplacian models, respectively, and

is the expectation operator.Proof: The proof is given in the Appendix.

In words, Theorem 1 specifies that, for a given L-2 distortion, an SID channel exhibits higher or equal CC capacity com-

pared with an SII channel. Hence, for a given L-2 distortion ,SW random binning for an SID channel is more efficient com-pared with that for an SII channel. This intrinsic coding gain isverified in Section VII-A.

III. PROPOSED HASH-BASED DVC ARCHITECTURE

The proposed hash-based DVC architecture is illustrated inFig. 3. At the encoder, the input video sequence is organizedinto groups of pictures (GOPs) and is decomposed into keyframes, i.e., the first frame in each GOP, and WZ frames. Thekey frames, denoted by , are encoded using H.264/AVC Intraframe coding, adhering to the main profile, as configured in [5].For each WZ frame, a novel hash is sent to aid SI creation atthe decoder, as detailed in Section IV. In addition to the codedhash, aWZ bit stream is formed for eachWZ frame, based on theTDWZ architecture [4]. Conversely, to other hash-driven DVCschemes, e.g., [23], the WZ encoder is chosen to encode theoriginal WZ frame, i.e., rather than its difference with the hash,in order to preserve the error-resilient traits of WZ coding forthe entire WZ frames’ waveform [4], [8]. We also note that theproposed TDWZ system is built on the principles of layeredWZcoding [11]. These principles have found application in alterna-tive scalable WZ video coding schemes, e.g., in [12] and [13].Contrary to these works in which the computational expensivetasks for motion estimation, rate control, and mode decision areperformed at the encoder, the proposed scheme aims at efficientyet low-cost DVC.

Specifically, at the encoder, the WZ frame’s pixel values aretransformed using the 4 4 separable integer transform, as inH.264/AVC [7], which has properties similar to the DCT. Usinga set of predefined quantization matrices (QMs), each DCT bandis independently quantized with levels. A uniform and a

double-deadzone scalar quantizer are employed for the DC andAC bands, respectively. After quantization, the quantization in-dices are converted into binary codewords and fed to the SWencoder. At this point, we notice that the employed SW codingneeds to cope with the asymmetric nature of the SID CC (seeSection II-A). In [43], McEliece argued that good communica-tion codes with Turbo-like decoders could be effectively usedon asymmetric channels. Furthermore, in [44], Wang et al. ex-tended density evolution, which is a strong analytical tool oflow-density parity-check (LDPC) codes, to asymmetric chan-nels and proposed good code constructions. Thus, SW codingis realized using the rate-adaptive LDPC accumulate (LDPCA)codes of [9], of which the performance is not degraded evenwhen the CC features strong asymmetries, e.g., Z-channel sta-tistics [9]. The derived syndrome bits per codeword are storedin a buffer, and a feedback channel is used to allow optimal ratecontrol [4].At the decoder, the key frames are H.264/AVC Intra decoded

and stored in a reference frame buffer. The hash information isdecoded by inverting the tasks applied at the encoder. Next, thenew OBMEC/SSM technique is used to generate a motion-com-pensated prediction of theWZ frame based on the received hashand reference frames, as detailed in Section V. Subsequently,the produced motion-compensated frame is DCT transformed,forming the SI for the WZ codec.Thereafter, per coded DCT band of the WZ frame, the de-

coder performs the proposed novel online SID CCE algorithm,which is described in Section VI-B. Per band of theWZ frame,the decoder produces soft estimates to decode theWZ bit-planesbased on the SID channel estimates and the value of eachSI coefficient. After decoding each bit-plane of a band of theWZ frame, the bit-plane is stored in a buffer, and the algorithm isexecuted again, enabling bit-plane-by-bit-plane progressive re-finement of the estimates. After decoding the final WZ


Fig. 4. Hash formation and spatial prediction processes. On the left, gray cir-cles with solid lines denote the subsampled original pixel values. On the right,dashed circles signify the MSB of the subsampled pixel values.

bit-plane of the band of the WZ frame, the estimatesare again updated, yielding improved estimation for the recon-struction process. After decoding all the bit-planes of all WZcoded bands of the WZ frame, MMSE reconstruction [8] andan inverse DCT are carried out, yielding the reconstructed WZframe .

IV. HASH INFORMATION CODING

The proposed hash information consists of the most sig-nificant bit-plane (MSB) of the dyadically subsampled lumacomponent of the original WZ frame . To encode this in-formation, each binary value at position inthe hash is first spatially predicted. The employed predictionscheme is essentially a low-complexity binary equivalent ofthe well-known edge-adaptive JPEG-LS predictor [45]. Morespecifically, each binary value is predicted by a Booleanfunction , with , , and , de-noting the left, top, and top-left neighboring binary values in thehash, respectively, as shown in Fig. 4. After the prediction stage,each prediction error is directly cal-culated using a single exclusive OR operation between the pre-dictor and the predicted value . Finally, each binarysymbol is coded using multiplication-free context-basedbinary arithmetic coding employing one of eight different prob-ability models. The probability model is selected based on theneighboring local gradients , , and , in the originalhash , with denoting the top-right neighbor of the predictedvalue (see Fig. 4).Notice that the proposed hash formation and coding pro-

cesses are designed in order to impose a limited complexityand memory usage overhead at the encoder. First, converselyto our prior work [26], the hash is formed based on the sub-sampled pixel values, requiring only 1/4 of the samples to befurther processed. Second, the spatial prediction process can beimplemented using simple binary arithmetic, making it idealfor hardware implementation. Third, in contrast to alternativesolutions, e.g., [22]–[25], the proposed technique does notperform any block-based decisions on the transmission of hashinformation at the encoder side. Hence, it is neither burdenedby the computationally expensive block-based comparisonsrequired for such mode decision nor does it require storingreference information from temporally adjacent frames. Lastly,contrary to [25] and [26], no temporal prediction is applied by

the proposed hash encoder, thereby preventing error propaga-tion between the hash data of consecutive WZ frames.

V. HASH-BASED MOTION ESTIMATION AT THE DECODER

In this section, we describe OBMEC/SSM, a technique whichgenerates accurate SI at the decoder based on the proposedhash. Compared with our previous OBMEPC [26] technique,OBMEC/SSM not only operates on a different hash but alsoincorporates novel tools to further improve the predictionquality and to reduce the computational complexity.

A. Bit-Plane OBME With Subsampled Reference Frames

OBME is performed in a hierarchical bidirectional predic-tion structure similar toMCI-based DVC systems [4], [5]. Usingtwo previously decoded WZ and/or key frames as past and fu-ture reference frames, the decoder performs OBME based onthe proposed hash. Specifically, let denote thereference frames and let denote the MSB of theluma component of . In addition, let denotethe decoded hash frame.Due to the lower resolution of the hash frame , the values in

each binary frame are reorganized into four subsampled ref-erence frames, i.e., , , by separating the valuesat even and odd positions in as(see Fig. 5). In this way, the newly formed binary referenceframes have the same resolution as the hash frame ,hence facilitating the execution of OBME. Next, based onand , downscaled motion vectors between the WZ

frame and the reference frames are found by OBME. Notethat OBME derives more than one motion vector per pixel,thereby decreasing the energy of the prediction error comparedwith other methods, e.g., [5]. In addition, blocking artifacts aredrastically reduced, thus increasing the subjective quality ofthe decoded frame.The OBME process proceeds as follows. Using an overlap

step size , the hash frame is divided into overlapping blocksof size samples, with top-left coordinates .For each overlapping block , the best matching block withina specified search range is found in one of the subsampledreference frames . The employed matching criterion max-imizes the number of binary values in the hash block thatare identical to the colocated binary values in block ,where is the associated motion vector. Therefore,for each overlapping block in the hash frame, OBME hasidentified the motion vector and indices , , and , whichdefine the best reference block . In addition, OBME re-tains the matching strength for the block , which will beused in the SI generation process. The matching strength isdefined as the number of binary values in that is identical tothe colocated binary values in divided by , i.e., thetotal number of samples in a block. We remark that due to thenature of the new hash, the matching process is carried out withbinary comparisons, thereby vastly diminishing the associatedcomplexity.

B. SI Generation

To generate SI, the motion vectors derived by OBME are firstupsampled. By construction, there is a direct correspondence


Fig. 5. Creation of the subsampled reference frames in the OBME process.

between every block (with top-left coordinates and sizepixels) in the hash frame and the block (with top-left

coordinates and size pixels) coveringthe SI frame . As a result, based on and , , and foundby OBME for the hash block , an equivalent motion vector,i.e., , to the original reference frame , isderived for the SI block . In this way, for each overlappingblock in the SI frame, a temporal predictor block, which isdenoted by , is found in the reference frame .After scaling the motion vectors, SI is generated by multi-

hypothesis pixel-based compensation. Specifically, each pixelin the SI frame belongs to a number of overlapping blocks

. This means that, each pixel in the SI frameis linked to a number of predictors , beingthe colocated pixel values in the blocks . Each SI pixelvalue is then derived by properly combining its predictors. Indetail, for the even pixel positions in the SI frame, the MSBof the original frame was transmitted in the hash . Thisbinary value is used to determine the weight of the predictorduring compensation. Specifically, if the binary value in thehash agrees with the MSB of a predictor , then thepredictor is said to be verified, and its weight is equal to theassociated matching strength . Otherwise, the predictor iscategorized as unverified, and its weight is empirically set tothe lowest value, i.e., . For the other pixels in the SI frame,for which hash information is unavailable, simple averaging oftheir corresponding predictors is applied to derive the SI values.We observe that, in [26], after motion estimation, there couldbe pixels without having any predictor. These pixels requiredan extra reconstruction phase. However, in OBMEC/SSM, allpixels are assigned with a set of predictors, as explained above,thereby yielding superior pixel value reconstruction.Lastly, we note that OBMEC/SSM also uses the motion vec-

tors generated by OBME to produce the chroma components ofthe WZ frame at the decoder, generating candidate predictorsbased on the chroma components of the reference frames. Theweights derived for the even positions in the luma componentare employed in the weighted averaging of the predictors.

VI. SID CCE

In this section, we present our novel algorithm for online SIDCCE. First, we present an offline SID channel estimator thatserves as a reference enabling the accuracy assessment of ouronline algorithm.

A. Offline SID Correlation Estimation

Offline SID correlation estimation is an ideal but unrealisticapproach since it assumes that the original frame pixel valuesare present at the decoder. Under this assumption, the decodercan form the transformed noise frame, i.e.,

, , where and denote the DCT-transformed WZ and SI frame’s coefficient of the band andblock . Per coded band of a WZ frame, offline SID estimatesare independently determined.We note that DCT coefficients are real numbered; therefore,

practically, in order to have a discrete number of SID sigmavalues, one needs to group the SI frame coefficients of each band.1 Then, for every coded band and transformed SI quantiza-tion index , the corresponding offline SID estimate is

(8)

where denotes the quantization of the SI frame coeffi-cients of band and is the number of quantization levels(QLs) for band . Equation (8) implies that, per band of a WZframe, is estimated as the standard deviation of the trans-formed noise frame coefficients of which the corresponding SIcoefficient value falls into the bin indexed by .For both the proposed offline and online SID algorithms, the

best RD performance is obtained based on a balance betweenthe SID noise stationarity level, the number of QLs of the SIframe coefficients per band, and the statistical support to ac-curately estimate the SID parameters. Concerning sta-tionarity, SID estimation can be applied to any small number ofspatially neighboring DCT coefficients , which be-long to the same DCT band, i.e., . However, note that,although using small spatial stationarity levels enables better ad-justment to fluctuating spatial statistics, precision can be chal-lenged by inaccurate estimation of the parameters dueto narrow statistical support. Additionally, the number of QLsused in the quantization of the SI coefficients affects the accu-racy of channel estimation. A low number of QLs drives thesystem to the SII paradigm. On the other hand, a high number ofQLs reduces the statistical support, hence deteriorating the RDperformance. We have empirically observed that the highest RD

1Notice that grouping (quantization) of the SI coefficients is only performedfor discretizing their alphabet in order to enable SID CCE. When deriving thesoft-decoding log-likelihood ratios and duringMMSE reconstruction, the actualvalues of the SI coefficients are used.


performance for the proposed offline and online SID algorithmsis obtained when the SID channel is considered stationary atband level, and the number of QLs per band is identical to thenumber of QLs employed to quantize the values of the trans-formed original frame at the encoder.

B. Online Successively Refined SID Correlation Estimation

The designed online SID algorithm notably advances overcontemporary online correlation estimation methods. First,apart from few exceptions, e.g., [31], most online methods[29], [30], [32], are solely created for MCI-based systems. Onthe contrary, the proposed technique is intended to not limitits applicability to a specific DVC architecture. Moreover, theproposed technique is the first to capture the dependence ofthe correlation noise on the SI, thereby exploiting the inherentcoding gain of SID (asymmetric) channel modeling. Whatis more, unlike alternative methods [30], [31], which refinethe SII model parameters across DCT bands of a WZ frame,the proposed technique improves the estimation accuracy byenabling a finer refinement of the SID model parameters, i.e.,across the bit-planes of a band of a WZ frame.As explained above, the proposed online algorithm guaran-

tees adequate statistical support for accurate SID estimation bybalancing the SID stationarity level and the number of QLs ofthe SI coefficients per band. For any coded band of a WZframe, the successive refinement algorithm is initiated after SWdecoding (SWD) of the first bit-plane of the band. To facilitatethe applicability of the algorithm to any DVC scheme, similarto [39], the estimates, which are re-quired to decode the first bit-plane of band , are obtained using(8) in which is approximated by the transformed noise ofthe reconstructed previous WZ frame and its corresponding SI.2

Since the correlation between the first WZ bit-plane and the SIis high, or else, since WZ quantization is very coarse, the esti-mation error caused by this approach does not notably influencethe RD performance, as shown in Section VII-A.In a nutshell, the principle of our online bit-plane-by-bit-

plane successive refinement SID algorithm is as follows. Percoded band of a WZ frame, the algorithm combines thealready SW-decoded bit-planes of the band to derive quantiza-tion indices of the WZ frame coefficients of the band. In thisway, apart from the SI coefficients, the algorithm is aware of acoarse description of the WZ frame’s coefficients of the band.Then, based on the available WZ quantization indices and thediscretized SI coefficients of the band, i.e., ,the algorithm empirically estimates the CC conditional PMFfor any given . Based on the obtained PMF for any , thealgorithm derives the corresponding conditional pdf. Noticethat, according to the SID model, the conditional pdf for agiven is Laplacian with parameter , which dependson . The derived SID estimates are used to decodethe next bit-plane. Thereafter, the decoder has access to a finerdescription of the band’s WZ coefficients, and the algorithmis repeated. After SWD all the bit-planes of the band , thealgorithm is executed again so as to refine the estimatesfor reconstruction.

2Only for the very first WZ frame, one (SII) sigma per coded WZ band isestimated at the encoder similarly to [46] and sent to the decoder. This is donein order to keep the required rate to code the sigmas as low as possible.

For any coded band of a WZ frame, the proposed progres-sive refinement algorithm is detailed in the next steps.Step 1) Let , , denote the number of de-

coded bit-planes of the WZ coefficients of band .These bit-planes are denoted by the binary -tu-ples , where, is the size of theWZ frame band. The decoder combines the avail-able binary -tuples to producea coarse description of the WZ coefficients of band, which are denoted by . The latter containsquantization indices of the WZ coefficients of theband in the range .3 In addi-tion, for the purpose of channel estimation, the de-coder discretizes the SI coefficients of band (seeSection VI-A), thereby producing the -tuple,containing the indices .

Step 2) Using the available and -tuples, the cor-relation estimator approximates the joint PMF

, where denotes the randomvariable of the quantization indices of the WZ coef-ficients, denotes the quantization index, orbin, , and . Thisapproximation is performed using the histogram.

Step 3) From the estimated joint PMF, the algorithm calcu-lates the empirical conditional PMF, i.e.,

(9)

For every index and, (9) gives the transition matrixof a discrete CC having the dis-

cretized SI coefficients as input and the quantizedWZ coefficients as output.

Step 4) Each row of this transition matrix, i.e.,, is a

conditional PMF for a given . The algorithmthen attempts to derive the conditional pdffor a given , which would yield this em-pirical PMF. According to the SID paradigm,

is a PMFthat would be derived by scalar quantization of aLaplacian distribution that has a scaling parameter

, and it is centered on , i.e.,the inversely quantized SI value derived from thequantization index . Therefore, to get the scalingparameter of each Laplacian distribution

, one needs to find the root of the followingfunction:

(10)

3In particular, given the available bit-planes for a given WZcoefficient, the quantizer cell indexed by is formed by merging the original,encoder-side quantizer cells for which the binary representation is prefixed by

.


where, and are the lower and upper boundsof the quantization bin with index , and hasreplaced for simplicity. The derivation andthe proof of the uniqueness of the root of (10) forevery are detailed in the Appendix.

After SWD of a bit-plane of the band of the WZ frame,steps 1–4 are executed again. The process is executed recur-sively, hence delivering a more accurate estimate with everyadditional decoded bit-plane of a band of the frame. This isof paramount importance since, for every additional bit-plane,WZ quantization becomes finer; thus, the impact of inaccuratechannel estimation on the coding efficiency increases.

VII. EXPERIMENTAL RESULTS

This section evaluates the proposed SID estimation methodand the compression performance of the proposed DVC system.Tests were carried out for all frames of the Foreman, Soccer,Carphone, and Silent sequences; at QCIF resolution; at a framerate of 15 Hz; and at a GOP sizes of 2, 4, and 8. These sequencesexhibit a variety of object and camera motion characteristics.In the OBMEC/SSM module, an overlap step size of ,a search range of pixels, and a block size areemployed. To assess the coding performance in terms of theBjøntegaard (BD) metric [47], four RD points have been drawncorresponding to QMs 1, 5, 7, and 8 of [5]. The quantization pa-rameters (QPs) of the H.264/AVC Intra encoder are selected inorder to maintain a constant decoding quality. Although chroma(YUV) encoding is supported, results are presented only for theluma (Y) component to allow a meaningful comparison withprior art [5], [23].

A. SID Model Validation

To demonstrate the precision of SID versus SII modeling,we compare the ideal SW rate, i.e., , calculatedusing the offline band-level SID (see Section VI-A), the offlineband-level SII [29], and the proposed online band-level SID cor-relation estimation algorithm (see Section VI-B). The numer-ical results, which are obtained with the presented hash-basedDVC codec, are computed for the DC band and averaged overall WZ frames. Seven rate points are depicted, corresponding toQM 8 of [5]. The results in Fig. 6 show that, at the same sta-tionarity level (i.e., band level), input-dependent (SID) channelmodeling offers a significant reduction of the ideal SW ratecompared with input-independent (SII) [29] estimation. Noticethat, at high rates, the theoretical gain of SID over SII modeling,which is given by (7), agrees with the experimental measure-ments, thereby confirming our theory.Regarding the accuracy of the proposed online SID algo-

rithm, one observes that, due to its successively refined nature,it performs closely to offline SID estimation. Fig. 6 also showsthat the SW rate for decoding the MSB using online SID esti-mation is typically higher than that of using offline SII [29], butthis rate loss is minor, as explained in Section VI-B. Nonethe-less, the higher the SW rate, the higher the gain brought by on-line SID versus offline SII [29] estimation.

Fig. 6. Ideal SW rate per number of coded bit-planes of the DC band for(a) Foreman GOP2 and (b) Soccer GOP8. The theoretical gain at high rates, asderived by (7), is (a) 0.38 bits/sample and (b) 0.61 bits/sample.

Fig. 7 depicts the RD comparison of the proposed onlineband-level SID algorithm against the offline band-level SIIchannel estimation of [29], and the state-of-the-art coeffi-cient-level SII TRACE method of [31]. Fig. 7 also includes thecoding results obtained with offline band-level SID estimation,which serves as an ideal SID channel estimate. The resultsreveal that offline SID estimation yields the best coding perfor-mance among the assessed methods, thus verifying the capacityof SID versus SII modeling, as theoretically anticipated. Wealso notice that the RD performance achieved with online SIDis close to that obtained with offline SID estimation, henceconfirming that the proposed online SID algorithm accuratelyestimates the SID channel statistics.It is also notable that online band-level SID estimation consis-

tently outperforms the offline band-level SII estimation method[29], yet the latter is impractical. One observes that the codinggain brought by the proposed online SIDmethod compared withthe offline band-level SII method [29] increases with rate sinceprogressive refinement allows the proposed online SID methodto improve its estimation accuracy as more information is de-coded. This trend is also shown in Fig. 6. In terms of the averageBD rate metric, the proposed online band-level SID algorithmoutperforms offline band-level SII estimation by 2.88%, 4.4%,4.05%, and 3.74% in Foreman GOP2, Soccer GOP2, CarphoneGOP4, and Silent GOP8, respectively.We point out that the proposed online SID algorithm

delivers higher compression efficiency compared with thestate-of-the-art TRACE technique [31], which models thecorrelation noise as input-independent (SII) and nonstationary(coefficient level). Unlike the online coefficient-level method of[29], which is solely designed for MCI-based schemes, TRACEcan be employed in the proposed system. In addition, TRACE


Fig. 7. RD performance comparison of correlation estimation methods for (a) Foreman, GOP2; (b) Soccer, GOP2; (c) Carphone, GOP4; and (d) Silent, GOP8.

TABLE IBD GAINS OF THE ONLINE SID ALGORITHM OVER TRACE [31]

enables progressively refined estimation across decoded DCTbands, outperforming the method of [29]. Nevertheless, com-pared with the proposed SID estimation method, TRACEbuilds on the SII paradigm and supports a coarser refinement,i.e., band-based versus bit-plane-based in the proposed SIDmethod. The BD gains brought by the proposed online SIDalgorithm over TRACE are summarized in Table I. We observethat the RD gains increase with the length of the GOP sincethe number of coded WZ frames grows. Moreover, the gainsincrease, i.e., up to 5.17% BD rate reduction for Soccer GOP8,when irregular video content is coded. This is anticipated since,as the quality of the SI deteriorates, the impact of accuratecorrelation estimation on the RD performance increases.

B. Compression Performance

Figs. 8 and 9 depict the RD performance of the proposedhash-based DVC codec, equipped with online SID estima-

tion, against a relevant set of state-of-the-art low-cost videoencoding schemes. Initially, the proposed codec is comparedagainst our previous spatial-domain unidirectional DVC(SDUDVC) scheme [26]. SDUDVC encodes the key framesand the MSBs of intermediate frames and performsOBMEPC at the decoder, using the key frames and the previousdecoded frame as references. The results, depicted in Fig. 8,reveal that the proposed codec substantially advances overSDUDVC [26], yielding gains of up to 43.33% and 45.53% inBD rate reduction for Foreman and Soccer GOP8, respectively.These gains are attributed to the novel contributions of thispaper. First, the employment of a DCT-domain WZ codecequipped with the proposed successive refinement online SIDchannel estimation method improves the RD performance andthe codec’s ability to achieve a wide range of rates. Second,the OBMEC/SSM technique, which is presented in Section V,significantly improves over the OBMEPC approach of [26].Experimental results (see Fig. 10) show that, although usingonly a quarter of the hash information needed by OBMEPC[26], OBMEC/SSM delivers on average 0.4 and 0.8 dB higherSI quality in Carphone and Soccer GOP2, respectively. Third,the proposed hash codec (see Section IV) significantly reducesthe required hash coding rate over our previous approach,without using temporal prediction and being less complex.Inparticular, in Fig. 10, the hash rate for OBMEC/SSM using theproposed codec is 12.3 and 10.6 kb/s for Carphone and Soccer,respectively. Conversely, the hash rate for OBMEPC using the


Fig. 8. Compression performance evaluation of the proposed hash-based DVC codec for Foreman QCIF, 15-Hz (left) and Soccer QCIF, 15-Hz (right) sequences;(a) and (b) GOP2, (c) and (d) GOP4, (e) and (f) GOP8.

codec of [26] is significantly higher, i.e., 48.4 and 47.6 kb/s, forCarphone and Soccer, respectively.Furthermore, the proposed codec is assessed against the

state-of-the-art DISCOVER codec [5]. The RD performanceof DISCOVER [5] and its execution times (see Section VII-C)have been obtained using the codec’s executable, as found onthe DISCOVER website [5]. The results, illustrated in Figs. 8and 9, show that the proposed codec generally outperformsDISCOVER. Only in Silent at low rates DISCOVER surpassesthe proposed codec since, for low motion content, MCI candeliver a good prediction. OBMEC/SSM requires additionalhash rate, which is considerable at low rates in Silent due to

low spatial correlation, thereby explaining this performancedifference. Overall, in sequences with well-behaved motion,the proposed codec yields compression improvements of up to16.59% and 2.48% BD rate reduction in Carphone and SilentGOP8, correspondingly. However, when irregular motioncontent is coded, the gains brought by the proposed codecagainst DISCOVER increase up to 31.41% and 30.95% BDrate reduction in Foreman and Soccer GOP8, respectively.These significant gains highlight the capacity of OBMEC/SSMin capturing difficult motion even in large GOPs; conditionsunder which MCI falls short in providing accurate prediction[19], [20].


Fig. 9. Compression performance evaluation of the proposed hash-based DVC codec for Carphone QCIF, 15-Hz (left) and Silent QCIF, 15-Hz (right) sequences;(a) and (b) GOP2, (c) and (d) GOP4, (e) and (f) GOP8.

To assess the performance of the proposed scheme againstthat of the latest hash-based DVC prior art, the coding results of[23] are illustrated in Fig. 8. The latter creates SI by combiningMCI with hash-based motion estimation using low-quality in-tracoded WZ frame blocks. The proposed codec improves overthe system of [23] by 24.93% and 10.65% BD rate reduction inForeman and Soccer, GOP8, respectively. These gains indicatethe superior capacity of OBMEC/SSM compared with the ap-proach of [23].In Figs. 8 and 9, we also include the performance of

H.264/AVC Intra and H.264/AVC No Motion. Note that theassessed H.264/AVC codecs exhibit higher encoding com-plexity than the proposed DVC solution since they feature loopfiltering, intra prediction, and mode decision. The results show

TABLE IIPERCENTAGE CONTRIBUTION IN THE TOTAL BIT RATE AND THE RSD OF THE

DECODED SEQUENCE FOR THE PROPOSED SYSTEM

that, similarly to DISCOVER, the proposed scheme outper-forms H.264/AVC Intra for low motion sequences at all GOPs,


TABLE IIICOMPARISON OF WZ DECODING EXECUTION TIMES (IN SECOND/WZ FRAME)

Fig. 10. SI quality for (a) Carphone, and (b) Soccer at GOP2. The dyadicallysubsampled MSB is used by OBMEC/SSM, while the entire MSB is used byOBMEPC. The key frames’ quality is identical for both methods QP .

e.g., gains of up to 31.89% BD rate reduction in Silent GOP 8are achieved. However, contrary to the other DVC codecs, theproposed scheme manages to partially outperform H.264/AVCIntra in Foreman and Carphone at all the evaluated GOP sizes,and H.264/AVC No Motion in Foreman GOP4 and GOP8. InSoccer, a sequence comprising very irregular motion content,the proposed codec significantly diminishes the performancegap of DVC with respect to H.264/AVC Intra and No Motion.Table II reports the percentages of the H.264/AVC Intra, the

hash, and the LDPCA rate in the total rate of the proposedcodec. Results are synopsized for the two extreme RD pointsof Foreman and Carphone, with GOP of 2 and 8. We observethat the intra and the WZ (i.e., hash and LDPCA) rate percent-ages agree with the portion of key and WZ frames in a GOP.Note that, for a given sequence and GOP size, the hash rateis unvarying; hence, its percentage decreases with increasingrate. The proposed codec efficiently compresses the hash, di-minishing the required hash rate overhead. Recall that the hashis coded in order to enable the creation of accurate SI at the de-coder and thus to increase the WZ coding performance. Table IIalso reports the relative standard deviation (RSD) of the de-coded frames’ PSNR (both key and WZ), given by RSD

, where is the standard deviationand is the mean of the PSNR values. The very low RSDvalues show that the proposed codec yields a quasi-constant de-coding quality. Bearing in mind that the number of LDPCA-coded bit-planes per RD point is the same with prior art [5], oneconcludes that the proposed novel techniques significantly in-crease the efficiency of WZ coding.

C. Complexity Evaluation

In order to evaluate the computational complexity of theproposed codec, we conducted execution time tests undercontrolled conditions (using Pentium D CPU at 3.2 GHz, 2048MB of RAM, and Windows XP operating system), as in [5] and[31]. At the encoder, the proposed system features the sameIntra and WZ codecs as DISCOVER [5]; thus, the encodingcomplexity is dominated by the coding of the key frames. Notethat, conversely to DISCOVER, the proposed encoder allocatesadditional resources to code the hash, yet this imposes very lowcomputational and memory demands (see Section IV). In par-ticular, results show that the hash encoder causes a negligibleoverhead of 1.15% and 1.19% on the encoding execution timefor Foreman and Soccer, with GOP8 and QM8, respectively.Table III assesses the decoding execution time of the pro-

posed codec with respect to DISCOVER’s executable [5]. Theresults, which are averaged over the four RD points, show thatmost of the total execution time (90.79% on average) is con-sumed by SWD, which performs repeated LDPCA decodingusing the feedback channel. OBMEC/SSM, is the second mostdemanding operation taking on average 9.17% of the total de-coding time. For the same number of predictors per pixel andsearch space per predictor, OBMEC/SSM is less complex thanOBMEPC [26] as it requires only 1/4 of operations for eachblock comparison. Concerning the complexity of CCE, we no-tice that the computational demands of offline SID estimationare similar to those of offline band-level SII estimation. Com-paring the offline with online SID estimation, we observe thatthe latter is more complex due to its bit-plane-by-bit-plane suc-cessively refined behavior. Yet, the proposed online SID esti-mator comprises simple histogram measuring and solving non-linear equations, which are performed efficiently using elegantalgorithms. Moreover, contrary to the demanding operations ofSI creation and LDPCA decoding, the online SID algorithm hasa minor contribution to the total complexity (0.04% on average).In addition, Table III shows that, although online SID estima-tion is more complex than offline band-level SII estimation, the


overall decoding complexity is reduced since correlation esti-mation is more accurate. As a result, SWD requires fewer feed-back channel requests and decoding operations.We point out that, although not optimized for speed, the total

decoding time of the proposed codec, i.e., with online SID esti-mation, is vastly lower than DISCOVER’s. This notable benefitis credited to the novel techniques encompassed by the proposedsystem, which significantly increase the decoding performance,thus reducing soft channel decoding operations.

VIII. CONCLUSION

In this paper, we have investigated the concept of SID mod-eling of the CC in DVC. In particular, we have theoreticallyshown that SID modeling leads to WZ RD gains compared withthe conventional SII assumption. Motivated by this finding, wehave proposed a novel online SID correlation estimation algo-rithm. Unlike alternative algorithms, the proposed technique en-ables bit-plane-by-bit-plane progressive refinement of the SIDmodel parameters, providing better accuracy. Additionally, wehave presented a novel hash-based DVC architecture featuringefficient coding of low-complex hash information. This infor-mation is exploited by OBMEC/SSM, i.e., a technique that en-ables accurate motion estimation at the decoder. Methodical ex-perimentation verifies the theoretical gains of SID comparedwith SII modeling and validates the RD improvements of theproposed online SID algorithm versus state-of-the-art correla-tion estimation methods [29], [31]. Furthermore, the proposedsystem is assessed against several relevant state-of-the-art dis-tributed and standard video codecs. Notable compression gainsare reported, e.g., up to 31.41% over DISCOVER [5], at a vastlyreduced overall complexity

APPENDIX

Proof of Lemma 1: Let be a genericLaplacian distribution and assume that the random variable isquantized using a uniform scalar quantizer centered on the meanof the distribution. The distortion resulting from quantizing

is given by

(11)

Replacing with and expressing the above as a sum ofthree terms leads to

(12)

By changing the variables, the first and the last sums areequal. Therefore

(13)In the second term, replacing again with leads to

(14)The last summation term is a geometric progression con-

verging to . The other two integrals can beeasily computed by parts, leading to (5), and ending the proof.

Proof of Lemma 2: For a given , the correspondingis Laplacian with mean . The distortions of the

SID and SII models are of the form given by (5), i.e.,

where, , whereasis a constant not depending on . The condition

for any is equiva-lent to

(15)

Let , then (15) canbe equivalently written as

(16)

One can easily verify that . Therefore,

a necessary condition to have the distortions of the SID and SIImodels equal for any is

(17)

Replacing and in (17) leadsto (6), which ends the proof.


Proof of Theorem 1: Supposing ideal SW coding, the rate isgiven by the conditional entropy of the quantization index giventhe SI value [41]. Under high-rate assumptions, the entropy ofa uniform scalar quantizer is approximated as [41] and [42]

(18)

For a Laplacian model, the differential entropy is given by

(19)

Now, let us compare the SW rate given by an SII and an SIDmodel. From (18) and (19), we derive

Applying Jensen’s inequality [40] in the last integral and bearingin mind that is a concave function, we have

(20)

where, . Assuming the averageSID distortion equal to the SII distortion for every RD point,lemma 2 holds. Combining (20) with (6) leads to

(21)

which ends the proof.Derivation and Uniqueness of the Root of (10): For every

, we have quantization indices or,else, columns in the transition matrix . That is

; there are equations of the form of(10), each of which is satisfied by the unknown .Based on the position of in relation to a quantization bin withindex , (10) is written as

It can be derived that is decreasing inand increasing in with being equal to

. In addition,, . Therefore, can

have no, one, or two solutions in , depending onthe sign of . Similarly, it can be found thatcan have no, one, or two solutions in . Contrary,

is monotonically decreasing in ,and, , thus, has one and only

one solution in . Hence, we choose the solutionof in order to get the online estimateof the Laplacian scaling parameter per . Whenor , meaning when the realization of the inverselyquantized SI value coincides with the upper or lower bound ofthe considered quantization bin, the solution of isanalytically found as

(22)

When the solution of is numericallyfound using the fast algorithm of [48], which combines bisec-tion, secant, and inverse quadratic interpolation methods.

REFERENCES

[1] D. Slepian and J. K. Wolf, “Noiseless coding of correlated informationsources,” IEEE Trans. Inf. Theory, vol. IT-19, no. 4, pp. 471–480, Jul.1973.

[2] A. D.Wyner and J. Ziv, “The rate–distortion function for source codingwith side information at the decoder,” IEEE Trans. Inf. Theory, vol.IT-22, no. 1, pp. 1–10, Jan. 1976.

[3] R. Puri, A. Majumdar, and K. Ramchandran, “PRISM: A video codingparadigm with motion estimation at the decoder,” IEEE Trans. ImageProcess., vol. 16, no. 10, pp. 2436–2448, Oct. 2007.

[4] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributedvideo coding,” Proc. IEEE, vol. 93, no. 1, pp. 71–83, Jan. 2005.

[5] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M.Quaret, “The DISCOVER codec: Architecture, techniques and eval-uation,” in Proc. Picture Coding Symp., Lisboa, Portugal, Nov. 2007[Online]. Available: www.discoverdvc.org

[6] W.-J. Chien and L. J. Karam, “BLAST-DVC: Bitplane selective dis-tributed video coding,” Multimedia Tools Appl., vol. 48, no. 3, pp.437–456, Jul. 2010.

[7] T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, “Overviewof the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst.Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[8] Z. Xiong, A. Liveris, and S. Cheng, “Distributed source coding forsensor networks,” IEEE Signal Process. Mag., vol. 21, no. 5, pp. 80–94,Sep. 2004.

[9] D. Varodayan, A. Aaron, and B. Girod, “Rate-adaptive codes fordistributed source coding,” Signal Process., vol. 86, no. 11, pp.3123–3130, Nov. 2006.

[10] A. Sehgal, A. Jagmohan, and N. Ahuja, “Wyner–Ziv coding of video:An error-resilient compression framework,” IEEE Trans. Multimedia,vol. 6, no. 2, pp. 249–258, Apr. 2004.

[11] S. Cheng and Z. Xiong, “Successive refinement for the Wyner–Zivproblem and layered code design,” IEEE Trans. Signal Process., vol.53, no. 8, pp. 3269–3281, Aug. 2005.

[12] Q. Xu and Z. Xiong, “Layered Wyner–Ziv video coding,” IEEE Trans.Image Process., vol. 15, no. 12, pp. 3791–3803, Dec. 2006.

[13] H. Wang, N.-M. Cheung, and A. Ortega, “A framework for adaptivescalable video coding using Wyner–Ziv techniques,” EURASIP J.Appl. Signal Process., vol. 2006, pp. 1–18, Jan. 2006.

[14] M.Quaret, F. Dufaux, and T. Ebrahimi, “Fusion-basedmultiview videocoding,” in Proc. ACM Int. Workshop Video Surveillance Sens. Netw.,Santa Barbara, CA, Oct. 2006.

[15] X. Guo, Y. Lu, F. Wu, D. Zhao, and W. Gao, “Wyner–Ziv-based mul-tiview video coding,” IEEE Trans. Circuits Syst. Video Technol., vol.18, no. 6, pp. 713–724, Jun. 2008.

[16] N. M. Cheung and A. Ortega, “Flexible video decoding: A distributedsource coding approach,” in Proc. IEEE Workshop Multimedia SignalProcess., Chania, Greece, Oct. 2007, pp. 103–106.


[17] B. Macchiavello, D. Mukherjee, and R. L. de Queiroz, “Iterativeside-information generation in a mixed resolution Wyner–Ziv frame-work,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 10, pp.1409–1423, Oct. 2009.

[18] S. Argyropoulos, N. Thomos, N. Boulgouris, and M. Strintzis, “Adap-tive frame interpolation for Wyner–Ziv video coding,” in Proc. IEEEWorkshop Multimedia Signal Process., Chania, Greece, Oct. 2007, pp.159–162.

[19] Z. Li, L. Liu, and E. J. Delp, “Rate distortion analysis of motion sideestimation in Wyner–Ziv video coding,” IEEE Trans. Image Process.,vol. 16, no. 1, pp. 98–113, Jan. 2007.

[20] M. Tagliasacchi, L. Frigerio, and S. Tubaro, “Rate–distortion anal-ysis of motion-compensated interpolation at the decoder in distributedvideo coding,” IEEE Signal Process. Lett., vol. 14, no. 9, pp. 625–628,Sep. 2007.

[21] R. Martins, C. Brites, J. Ascenso, and F. Pereira, “Refining sideinformation for improved transform domain Wyner–Ziv videocoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 9, pp.1327–1341, Sep. 2009.

[22] A. Aaron, S. Rane, and B. Girod, “Wyner–Ziv video coding with hash-based motion compensation at the receiver,” in Proc. IEEE Int. Conf.Image Process., Singapore, Oct. 2004, pp. 3097–3100.

[23] J. Ascenso, C. Brites, and F. Pereira, “A flexible side informationgeneration framework for distributed video coding,”Multimedia ToolsAppl., vol. 48, no. 3, pp. 381–409, Jul. 2010.

[24] C. Yaacoub, J. Farah, and B. Pesquet-Popescu, “Improving hash-basedWyner–Ziv video coding using genetic algorithms,” in Proc. Int. Mo-bile Multimedia Commun. Conf., London, U.K., Sep. 2009, p. 30.

[25] J. Ascenso and F. Pereira, “Adaptive hash-based side informationexploitation for efficient Wyner–Ziv video coding,” in Proc. IEEEInt. Conf. Image Process., San Antonio, TX, Sep. 2007, pp.III-29–III-32.

[26] N. Deligiannis, A.Munteanu, T. Clerckx, J. Cornelis, and P. Schelkens,“Overlapped block motion estimation and probabilistic compensationwith application in distributed video coding,” IEEE Signal Process.Lett., vol. 16, no. 9, pp. 743–746, Sep. 2009.

[27] R. P. Westerlaken, R. K. Gunnewiek, and R. L. Lagendijk, “Therole of the virtual channel in distributed source coding of video,” inProc. IEEE Int. Conf. Image Process., Genova, Italy, Sep. 2005, pp.I-581–I-584.

[28] R. P. Westerlaken, S. Borchert, R. K. Gunnewiek, and R. L. Lagendijk,“Finding a near optimal dependency channel model for a LDPC-basedWyner–Ziv video compression scheme,” in Proc. Adv. School Comput.Imag., Lommel, Belgium, Jun. 2006, pp. 456–463.

[29] C. Brites and F. Pereira, “Correlation noise modeling for efficient pixeland transform domain Wyner–Ziv video coding,” IEEE Trans. CircuitsSyst. Video Technol., vol. 18, no. 9, pp. 1177–1190, Sep. 2008.

[30] X. Huang and S. Forchhammer, “Improved virtual channel noise modelfor transform domain Wyner–Ziv video coding,” in Proc. IEEE Int.Conf. Acoust. Speech Signal Process., Taipei, Taiwan, Apr. 2009, pp.921–924.

[31] X. Fan, O. C. Au, and N. M. Cheung, “Transform-domain adaptivecorrelation estimation (TRACE) for Wyner–Ziv video coding,” IEEETrans. Circuits Syst. Video Technol., vol. 20, no. 11, pp. 1423–1436,Nov. 2010.

[32] G. R. Esmaili and P. C. Cosman, “Wyner–Ziv video coding with clas-sified correlation noise estimation and key frame coding mode selec-tion,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2463–2474, Sep.2011.

[33] L. Stankovic, V. Stankovic, S. Wang, and S. Cheng, “Correlationestimation with particle-based belief propagation for distributed videocoding,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process.,Prague, Czech Republic, May 2011, pp. 1505–1508.

[34] N. Deligiannis, A.Munteanu, T. Clerckx, J. Cornelis, and P. Schelkens,“On the side-information dependency of the temporal correlation inWyner–Ziv video coding,” in Proc. IEEE Int. Conf. Acoust. SpeechSignal Process., Taipei, Taiwan, Apr. 2009, pp. 709–712.

[35] N. Deligiannis, A.Munteanu, T. Clerckx, J. Cornelis, and P. Schelkens,“Correlation channel estimation in pixel-domain distributed videocoding,” in Proc. Int. Workshop Image Anal. Multimedia InteractiveServ., London, U.K., May 2009, pp. 93–96.

[36] N. Deligiannis, A.Munteanu, T. Clerckx, J. Cornelis, and P. Schelkens,“Modelling the correlation noise in spatial domain distributed videocoding,” in Proc. IEEE Data Compression Conf., Snowbird, UT, Mar.2009, p. 443.

[37] A. Wyner, “Recent results in the Shannon theory,” IEEE Trans. Inf.Theory, vol. IT-20, no. 1, pp. 2–10, Jan. 1974.

[38] A. Papoulis, Probability, Random Variables, and Stochastic Pro-cesses. New York: McGraw-Hill, 1987.

[39] A. Aaron and B. Girod, “Wyner–Ziv video coding with low encodercomplexity,” in Proc. Picture Coding Symp., San Francisco, CA, Dec.2004.

[40] T.M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.

[41] D. Rebollo-Monedero, “Quantization and transforms for distributedsource coding,” Ph.D. dissertation, Stanford Univ., Stanford, CA,2007.

[42] H. Gish and J. N. Pierce, “Asymtotically efficient quantizing,” IEEETrans. Inf. Theory, vol. 14, no. 5, pp. 676–683, Sep. 1968.

[43] R. J. McEliece, “Are turbo-like codes effective on nonstandard chan-nels,” IEEE Inf. Theory Soc. Newsl., vol. 51, no. 4, pp. 1–8, Dec. 2001.

[44] C.-C. Wang, S. Kulkarni, and V. Poor, “Density evolution for asym-metric memoryless channels,” IEEE Trans. Inf. Theory, vol. 51, no.12, pp. 4216–4236, Dec. 2005.

[45] M. J. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO-I losslessimage compression algorithm: Principles and standardization intoJPEG-LS,” IEEE Trans. Image Process., vol. 9, no. 8, pp. 1309–1324,Aug. 2000.

[46] M. Morbee, J. Prades-Nebot, A. Pizurica, and W. Philips, “Rate al-location for pixel-domain distributed video coding without feedbackchannel,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr.2007, pp. I-521–I-524.

[47] G. Bjntegaard, “Calculation of average PSNR differences betweenRD-curves,” in ITU-Telecommunications Standardization SectorVCEG, Austin, TX, April 2001.

[48] G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Methodsfor Mathematical Computations. Englewood Cliffs, NJ: Pren-tice-Hall, 1976.

Nikos Deligiannis (S’08–M’10) received theM.Eng.degree in electrical and computer engineering fromthe University of Patras, Patras, Greece, in 2006. Heis currently working toward the Ph.D. degree withVrije Universiteit Brussel (VUB), Brussels, Belgium.From December 2006 to September 2007, he was

with the Wireless Telecommunications Laboratory,University of Patras. He joined the Department ofElectronics and Infomatics, VUB, in October 2007.His current research interests include statisticalchannel modeling, multimedia coding, distributed

source coding, multiple description coding, wireless cellular networks, andlocation positioning.Mr. Deligiannis was the corecipient of the 2011 ACM/IEEE International

Conference on Distributed Smart Cameras Best Paper Award.

Joeri Barbarien received the M.S. degree inelectrical engineering and the Ph.D. degree inengineering sciences from the Vrije UniversiteitBrussel (VUB), Brussels, Belgium, in 2000 and2006, respectively.From 2000 to 2011, he was a member of the

Department of Electronics and Informatics, VUB,with the last three years as a Postdoctoral Researcherand a part-time Professor. From 2006 to 2011, hewas also actively involved as a Senior Researcherand a Project Coordinator with the Interdisciplinary

Institute for Broadband Technology (IBBT), Ghent, Belgium. In October 2011,he left the university and the IBBT to join Technicolor. He is the author or


coauthor of more than 50 journal and conference publications, book chapters,patent applications, and contributions to standards. His research interestsinclude scalable video and still-image coding, distributed source coding,watermarking, implementation aspects of multimedia processing algorithmsand network technology.

Marc Jacobs received the M.Sc. degree in social andmilitary sciences from the Royal Military Academyof Brussels, Brussels, Belgium, in 1992 and theM.Sc.degree in applied informatics from Vrije UniversiteitBrussel (VUB), Brussels, in 1999.He started his career as an Officer at Belgian De-

fense, where he performed different functions withinthe IT-organization. Since November 2006, he hasbeen a member with the Department of Electronicsand Informatics, VUB, and the Interdisciplinary In-stitute for Broadband Technology, Ghent, Belgium,

where he is currently working as a Researcher on video-compression-relatedtopics.

Adrian Munteanu (M’07) received the M.Sc.degree in electronics from Politehnica University ofBucharest, Bucharest, Romania, in 1994; the M.Sc.degree in biomedical engineering from Universityof Patras, Patras, Greece, in 1996; and the Ph.D.degree in applied sciences (awarded with the highestdistinction and congratulations of the jury members)from Vrije Universiteit Brussel (VUB), Brussel,Belgium, in 2003.From 2004 to 2010, he was a postdoctoral fellow

with the Fund for Scientific Research, Flanders, Bel-gium. Since 2007, he has been a Professor with VUB. He is also a ResearchLeader of the 4Media group with the Interdisciplinary Institute of BroadbandTechnology, Ghent, Belgium. He is the author or coauthor of more than 200journal and conference publications, book chapters, patent applications and con-tributions to standards. His research interests include scalable image and videocoding, distributed video coding, scalable coding of 3-D graphics, 3-D videocoding, error-resilient coding, multiresolution image and video analysis, videosegmentation and indexing, multimedia transmission over networks, and statis-tical modeling.Dr. Munteanu was the recipient of the 2004 BARCO-FWO prize for his Ph.D.

work.

Athanassios Skodras (M’88–SM’99) received theB.Sc. degree in physics from Aristotle University ofThessaloniki, Thessaloniki, Greece, in 1980 and theM.Eng. degree in computer engineering and infor-matics and the Ph.D. degree in electronics from theUniversity of Patras, Patras, Greece, in 1986.Since 1986, he has been holding teaching and

research positions with the University of Patras andCTI, Greece. Since 2002, he has been the Professorof Digital Systems and Head of Computer Science,Hellenic Open University, Patras. During the aca-

demic years 1988 to 1989 and 1996 to 1997, he has been visiting the DEEE,Imperial College, London, U.K. He is the author or coauthor of 130 papers injournals and conference proceedings, six books, three book chapters, and twointernational patents. His research interests include image and video codingand analysis, digital watermarking, fast transform algorithms, and real-timedigital signal processing.Dr. Skodras serves as an Associate Editor for the IEEE Signal Processing

Letters, the Springer Journal of Real-Time Image Processing, and the ElsevierPattern Recognition; a Reviewer for numerous journals and conferences; and amember of many technical program committees of IEEE and other internationalconferences. He also serves as the Chair of the IEEE Greece Section and theChair of the IAPR Greek Association of Image Processing and Digital Media.He was the corecipient of the First Place Chester Sall Award for the best paperin the 2000 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, the 2005 IEEECircuits and Systems Chapter-of-the-Year Award, and the 2009 IEEE Region 8Circuits and Systems Chapter-of-the-Year Award. He is a Chartered Engineerand a member of the IET, EURASIP, and the Technical Chamber of Greece.

Peter Schelkens (M’97) received the degree inelectronic engineering in very large scale inte-gration design from the Industriële HogeschoolAntwerpen-Mechelen (IHAM), Campus Mechelenand the M.Sc. Electrical Engineering degree inapplied physics, the Biomedical Engineering degreein medical physics, and the Ph.D. degree in appliedsciences from the Vrije Universiteit Brussel (VUB),Brussel, Belgium.He currently holds a professorship with the Depart-

ment of Electronics and Informatics, VUB. He is amember of the scientific staff with the Interdisciplinary Institute for BroadbandTechnology, Ghent, Belgium. Additionally, since 1995, he has also been affil-iated to the Interuniversity Microelectronics Institute, Leuven, Belgium, as aScientific Collaborator. Since 2010, he has become a member of the board ofcouncilors of the same institute. He is the author of over 200 papers in jour-nals and conference proceedings. He is the coeditor of the books, The JPEG2000 Suite (West Sussex, U.K.: Wiley, 2009) and Optical and Digital ImageProcessing (West Sussex, U.K.: Wiley, 2011). He is a holder of several patentsand has contributed to several standardization processes. His research interestsinclude multidimensional signal processing encompassing the representation,communication, security, and rendering of these signals while especially fo-cusing on cross-disciplinary research.Dr. Schelkens is the Belgian Head of delegation for the ISO/IEC JPEG stan-

dardization committee, the Editor/Chair of part 10 of JPEG 2000: “Extensionsfor Three-Dimensional Data,” and the PR Chair of the JPEG committee. He is amember of SPIE and ACM and is currently the Belgian EURASIP Liaison Of-ficer. In 2011, Peter Schelkens acted as General Co-Chair of IEEE InternationalConference on Image Processing and the Workshop on Quality of MultimediaExperience.

side-information-dependent correlation channel estimation in

Documents