adaptive wavelet thresholding for image denoising … wavelet thresholding for image denoising ......

15
1532 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000 Adaptive Wavelet Thresholding for Image Denoising and Compression S. Grace Chang, Student Member, IEEE, Bin Yu, Senior Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract—The first part of this paper proposes an adaptive, data-driven threshold for image denoising via wavelet soft-thresh- olding. The threshold is derived in a Bayesian framework, and the prior used on the wavelet coefficients is the generalized Gaussian distribution (GGD) widely used in image processing applications. The proposed threshold is simple and closed-form, and it is adap- tive to each subband because it depends on data-driven estimates of the parameters. Experimental results show that the proposed method, called BayesShrink, is typically within 5% of the MSE of the best soft-thresholding benchmark with the image assumed known. It also outperforms Donoho and Johnstone’s SureShrink most of the time. The second part of the paper attempts to further validate recent claims that lossy compression can be used for denoising. The BayesShrink threshold can aid in the parameter selection of a coder designed with the intention of denoising, and thus achieving simultaneous denoising and compression. Specifically, the zero-zone in the quantization step of compression is analogous to the threshold value in the thresholding function. The remaining coder design parameters are chosen based on a criterion derived from Rissanen’s minimum description length (MDL) principle. Experiments show that this compression method does indeed re- move noise significantly, especially for large noise power. However, it introduces quantization noise and should be used only if bitrate were an additional concern to denoising. Index Terms—Adaptive method, image compression, image de- noising, image restoration, wavelet thresholding. I. INTRODUCTION A N IMAGE is often corrupted by noise in its acquisition or transmission. The goal of denoising is to remove the noise while retaining as much as possible the important signal fea- tures. Traditionally, this is achieved by linear processing such as Wiener filtering. A vast literature has emerged recently on Manuscript received January 22, 1998; revised April 7, 2000. This work was supported in part by the NSF Graduate Fellowship and the Univer- sity of California Dissertation Fellowship to S. G. Chang; ARO Grant DAAH04-94-G-0232 and NSF Grant DMS-9322817 to B. Yu; and NSF Grant MIP-93-213002 and Swiss NSF Grant 20-52347.97 to M. Vetterli. Part of this work was presented at the IEEE International Conference on Image Processing, Santa Barbara, CA, October 1997. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Patrick L. Combettes. S. G. Chang was with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA. She is now with Hewlett-Packard Company, Grenoble, France (e-mail: [email protected]). B. Yu is with the Department of Statistics, University of California, Berkeley, CA 94720 USA (e-mail: [email protected]) M. Vetterli is with the Laboratory of Audiovisual Communications, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland and also with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA. Publisher Item Identifier S 1057-7149(00)06914-1. signal denoising using nonlinear techniques, in the setting of additive white Gaussian noise. The seminal work on signal de- noising via wavelet thresholding or shrinkage of Donoho and Johnstone ([13]–[16]) have shown that various wavelet thresh- olding schemes for denoising have near-optimal properties in the minimax sense and perform well in simulation studies of one-dimensional curve estimation. It has been shown to have better rates of convergence than linear methods for approxi- mating functions in Besov spaces ([13], [14]). Thresholding is a nonlinear technique, yet it is very simple because it operates on one wavelet coefficient at a time. Alternative approaches to nonlinear wavelet-based denoising can be found in, for example, [1], [4], [8]–[10], [12], [18], [19], [24], [27]–[29], [32], [33], [35], and references therein. On a seemingly unrelated front, lossy compression has been proposed for denoising in several works [6], [5], [21], [25], [28]. Concerns regarding the compression rate were explicitly addressed. This is important because any practical coder must assume a limited resource (such as bits) at its disposal for repre- senting the data. Other works [4], [12]–[16] also addressed the connection between compression and denoising, especially with nonlinear algorithms such as wavelet thresholding in a mathe- matical framework. However, these latter works were not con- cerned with quantization and bitrates: compression results from a reduced number of nonzero wavelet coefficients, and not from an explicit design of a coder. The intuition behind using lossy compression for denoising may be explained as follows. A signal typically has structural correlations that a good coder can exploit to yield a concise rep- resentation. White noise, however, does not have structural re- dundancies and thus is not easily compressable. Hence, a good compression method can provide a suitable model for distin- guishing between signal and noise. The discussion will be re- stricted to wavelet-based coders, though these insights can be extended to other transform-domain coders as well. A concrete connection between lossy compression and denoising can easily be seen when one examines the similarity between thresholding and quantization, the latter of which is a necessary step in a prac- tical lossy coder. That is, the quantization of wavelet coefficients with a zero-zone is an approximation to the thresholding func- tion (see Fig. 1). Thus, provided that the quantization outside of the zero-zone does not introduce significant distortion, it fol- lows that wavelet-based lossy compression achieves denoising. With this connection in mind, this paper is about wavelet thresh- olding for image denoising and also for lossy compression. The threshold choice aids the lossy coder to choose its zero-zone, and the resulting coder achieves simultaneous denoising and compression if such property is desired. 1057–7149/00$10.00 © 2000 IEEE

Upload: dokiet

Post on 20-Apr-2018

264 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1532 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

Adaptive Wavelet Thresholding for Image Denoisingand Compression

S. Grace Chang, Student Member, IEEE, Bin Yu, Senior Member, IEEE, and Martin Vetterli, Fellow, IEEE

Abstract—The first part of this paper proposes an adaptive,data-driven threshold for image denoising via wavelet soft-thresh-olding. The threshold is derived in a Bayesian framework, and theprior used on the wavelet coefficients is the generalized Gaussiandistribution (GGD) widely used in image processing applications.The proposed threshold is simple and closed-form, and it is adap-tive to each subband because it depends on data-driven estimatesof the parameters. Experimental results show that the proposedmethod, called BayesShrink, is typically within 5% of the MSEof the best soft-thresholding benchmark with the image assumedknown. It also outperforms Donoho and Johnstone’sSureShrinkmost of the time.

The second part of the paper attempts to further validaterecent claims that lossy compression can be used for denoising.The BayesShrink threshold can aid in the parameter selectionof a coder designed with the intention of denoising, and thusachieving simultaneous denoising and compression. Specifically,the zero-zone in the quantization step of compression is analogousto the threshold value in the thresholding function. The remainingcoder design parameters are chosen based on a criterion derivedfrom Rissanen’s minimum description length (MDL) principle.Experiments show that this compression method does indeed re-move noise significantly, especially for large noise power. However,it introduces quantization noise and should be used only if bitratewere an additional concern to denoising.

Index Terms—Adaptive method, image compression, image de-noising, image restoration, wavelet thresholding.

I. INTRODUCTION

A N IMAGE is often corrupted by noise in its acquisition ortransmission. The goal of denoising is to remove the noise

while retaining as much as possible the important signal fea-tures. Traditionally, this is achieved by linear processing suchas Wiener filtering. A vast literature has emerged recently on

Manuscript received January 22, 1998; revised April 7, 2000. This workwas supported in part by the NSF Graduate Fellowship and the Univer-sity of California Dissertation Fellowship to S. G. Chang; ARO GrantDAAH04-94-G-0232 and NSF Grant DMS-9322817 to B. Yu; and NSF GrantMIP-93-213002 and Swiss NSF Grant 20-52347.97 to M. Vetterli. Part of thiswork was presented at the IEEE International Conference on Image Processing,Santa Barbara, CA, October 1997. The associate editor coordinating thereview of this manuscript and approving it for publication was Prof. Patrick L.Combettes.

S. G. Chang was with the Department of Electrical Engineering and ComputerSciences, University of California, Berkeley, CA 94720 USA. She is now withHewlett-Packard Company, Grenoble, France (e-mail: [email protected]).

B. Yu is with the Department of Statistics, University of California, Berkeley,CA 94720 USA (e-mail: [email protected])

M. Vetterli is with the Laboratory of Audiovisual Communications, SwissFederal Institute of Technology (EPFL), Lausanne, Switzerland and also withthe Department of Electrical Engineering and Computer Sciences, University ofCalifornia, Berkeley, CA 94720 USA.

Publisher Item Identifier S 1057-7149(00)06914-1.

signal denoising using nonlinear techniques, in the setting ofadditive white Gaussian noise. The seminal work on signal de-noising viawavelet thresholdingor shrinkageof Donoho andJohnstone ([13]–[16]) have shown that various wavelet thresh-olding schemes for denoising have near-optimal properties inthe minimax sense and perform well in simulation studies ofone-dimensional curve estimation. It has been shown to havebetter rates of convergence than linear methods for approxi-mating functions in Besov spaces ([13], [14]). Thresholding isa nonlinear technique, yet it is very simple because it operateson one wavelet coefficient at a time. Alternative approaches tononlinear wavelet-based denoising can be found in, for example,[1], [4], [8]–[10], [12], [18], [19], [24], [27]–[29], [32], [33],[35], and references therein.

On a seemingly unrelated front, lossy compression has beenproposed for denoising in several works [6], [5], [21], [25],[28]. Concerns regarding the compression rate were explicitlyaddressed. This is important because any practical coder mustassume a limited resource (such as bits) at its disposal for repre-senting the data. Other works [4], [12]–[16] also addressed theconnection between compression and denoising, especially withnonlinear algorithms such as wavelet thresholding in a mathe-matical framework. However, these latter works were not con-cerned with quantization and bitrates: compression results froma reduced number of nonzero wavelet coefficients, and not froman explicit design of a coder.

The intuition behind using lossy compression for denoisingmay be explained as follows. A signal typically has structuralcorrelations that a good coder can exploit to yield a concise rep-resentation. White noise, however, does not have structural re-dundancies and thus is not easily compressable. Hence, a goodcompression method can provide a suitable model for distin-guishing between signal and noise. The discussion will be re-stricted to wavelet-based coders, though these insights can beextended to other transform-domain coders as well. A concreteconnection between lossy compression and denoising can easilybe seen when one examines the similarity between thresholdingand quantization, the latter of which is a necessary step in a prac-tical lossy coder. That is, the quantization of wavelet coefficientswith a zero-zoneis an approximation to the thresholding func-tion (see Fig. 1). Thus, provided that the quantization outsideof the zero-zone does not introduce significant distortion, it fol-lows that wavelet-based lossy compression achieves denoising.With this connection in mind, this paper is about wavelet thresh-olding for image denoising and also for lossy compression. Thethreshold choice aids the lossy coder to choose its zero-zone,and the resulting coder achieves simultaneous denoising andcompression if such property is desired.

1057–7149/00$10.00 © 2000 IEEE

Page 2: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1533

Fig. 1. Thresholding function can be approximated by quantization with azero-zone.

The theoretical formalization of filtering additiveiid Gaussiannoise (of zero-mean and standard deviation) via thresholdingwavelet coefficients was pioneered by Donoho and Johnstone[14]. A wavelet coefficient is compared to a given threshold andis set to zero if its magnitude is less than the threshold; otherwise,it is kept or modified (depending on the thresholding rule). Thethreshold acts as an oracle which distinguishes between theinsignificant coefficients likely due to noise, and the significantcoefficients consisting of important signal structures. Thresh-olding rules are especially effective for signals with sparse ornear-sparserepresentationswhereonlyasmall subsetof thecoef-ficients represents all or most of the signal energy. Thresholdingessentially creates a region around zero where the coefficientsare considered negligible. Outside of this region, the thresholdedcoefficients are kept to full precision (that is, without quanti-zation). Their most well-known thresholding methods includeVisuShrink[14] andSureShrink[15]. These threshold choicesenjoy asymptotic minimax optimalities over function spacessuch as Besov spaces. For image denoising, however,VisuShrinkis known to yield overly smoothed images. This is because itsthreshold choice, (called theuniversal thresholdand

is the noise variance), can be unwarrantedly large due to itsdependenceonthenumberofsamples,,whichismorethanforatypical test imageofsize .SureShrinkusesahybridof the universal threshold and the SURE threshold, derived fromminimizing Stein’s unbiased risk estimator [30], and has beenshowntoperformwell.SureShrinkwillbethemaincomparisontothe method proposed here, and, as will be seen later in this paper,ourproposed thresholdoftenyieldsbetter result.

Since the works of Donoho and Johnstone, there has beenmuch research on finding thresholds for nonparametric estima-tion in statistics. However, few are specifically tailored for im-ages. In this paper, we propose a framework and a near-op-timal threshold in this framework more suitable for image de-noising. This approach can be formally described as Bayesian,but this only describes our mathematical formulation, not ourphilosophy. The formulation is grounded on the empirical ob-servation that the wavelet coefficients in a subband of a naturalimage can be summarized adequately by ageneralized Gaussiandistribution (GGD). This observation is well-accepted in theimage processing community (for example, see [20], [22], [23],[29], [34], [36]) and is used for state-of-the-art image codersin [20], [22], [36]. It follows from this observation that the av-erage MSE (in a subband) can be approximated by the corre-sponding Bayesian squared error risk with the GGD as the priorapplied to each in aniid fashion. That is, a sum is approximatedby an integral. We emphasize that this is an analytical approx-imation and our framework is broader than assuming wavelet

coefficients areiid draws from a GGD. The goal is to find thesoft-threshold that minimizes this Bayesian risk, and we call ourmethodBayesShrink.

The proposed Bayesian risk minimization is subband-depen-dent. Given the signal being generalized Gaussian distributedand the noise being Gaussian, via numerical calculation a nearlyoptimal threshold for soft-thresholding is found to be

(where is the noise variance and the signal vari-ance). This threshold gives a risk within 5% of the minimal riskover a broad range of parameters in the GGD family. To makethis threshold data-driven, the parameters and are esti-mated from the observed data, one set for each subband.

To achieve simultaneous denoising and compression, thenonzero thresholded wavelet coefficients need to be quantized.Uniform quantizer and centroid reconstruction is used on theGGD. The design parameters of the coder, such as the number ofquantizationlevelsandbinwidths,aredecidedbasedonacriterionderived from Rissanen’sminimum description length(MDL)principle [26]. This criterion balances the tradeoff between thecompression rate and distortion, and yields a nice interpretationofoperatingata fixedslopeon therate-distortioncurve.

The paper is organized as follows. In Section II, the waveletthresholding idea is introduced. Section II-A explains thederivation of the BayesShrinkthreshold by minimizing aBayesian risk with squared error. The lossy compression basedon the MDL criterion is explained in Section III. Experimentalresults on several test images are shown in Section IV andcompared withSureShrink. To benchmark against the bestpossible performance of a threshold estimate, the compar-isons also includeOracleShrink, the best soft-thresholdingestimate obtainable assuming the original image known, andOracleThresh, the best hard-thresholding counterpart. TheBayesShrinkmethod often comes to within 5% of the MSEsof OracleShrink, and is better thanSureShrinkup to 8% mostof the time, or is within 1% if it is worse. Furthermore, theBayesShrinkthreshold is very easy to compute.BayesShrinkwith the additional MDL-based compression, as expected,introduces quantization noise to the image. This distortion maynegate the denoising achieved by thresholding, especially when

is small. However, for larger values of, the MSE due to thelossy compression is still significantly lower than that of thenoisy image, while fewer bits are used to code the image, thusachieving both denoising and compression.

II. WAVELET THRESHOLDING ANDTHRESHOLDSELECTION

Let the signal be , where is someinteger power of 2. It has been corrupted by additive noise andone observes

(1)

where are independent and identically distributed ()as normal and independent of . The goal is toremove the noise, or “denoise” , and to obtain an estimate

of which minimizes the mean squared error (MSE),

MSE (2)

Page 3: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1534 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

Fig. 2. Subbands of the 2-D orthogonal wavelet transform.

Let , , and ; that is,the boldfaced letters will denote the matrix representation of thesignals under consideration. Let denote the matrixof wavelet coefficients of , where is the two-dimensionaldyadic orthogonal wavelet transform operator, and similarly

and . The readers are referred to references such as[23], [31] for details of the two-dimensional orthogonal wavelettransform. It is convenient to label the subbands of the transformas in Fig. 2. The subbands , , ,are called thedetails, where is thescale, with being thelargest (or coarsest) scale in the decomposition, and a subbandat scale has size . The subband is thelowresolution residual, and is typically chosen large enough suchthat and . Note that since the transformis orthogonal, are alsoiid .

The wavelet-thresholding denoising method filters each co-efficient from the detail subbands with a threshold function(to be explained shortly) to obtain . The denoised estimate isthen , where is the inverse wavelet transformoperator.

There are two thresholding methods frequently used. Thesoft-thresholdfunction (also called the shrinkage function)

sgn (3)

takes the argument and shrinks it toward zero by thethreshold. The other popular alternative is thehard-thresholdfunction

(4)

which keeps the input if it is larger than the threshold; oth-erwise, it is set to zero. The wavelet thresholding procedure re-moves noise by thresholdingonly the wavelet coefficients of thedetail subbands, while keeping the low resolution coefficientsunaltered.

The soft-thresholding rule is chosen over hard-thresholdingfor several reasons. First, soft-thresholding has been shown toachieve near-optimal minimax rate over a large range of Besovspaces [12], [14]. Second, for the generalized Gaussian priorassumed in this work, the optimal soft-thresholding estimatoryields a smaller risk than the optimal hard-thresholding esti-mator (to be shown later in this section). Lastly, in practice, thesoft-thresholding method yields more visually pleasant imagesover hard-thresholding because the latter is discontinuous andyields abrupt artifacts in the recovered images, especially when

the noise energy is significant. In what follows, soft-thresh-olding will be the primary focus.

While the idea of thresholding is simple and effective,finding a good threshold is not an easy task. For one-dimen-sional (1-D) deterministic signal of length , Donoho andJohnstone [14] proposed forVisuShrinkthe universal threshold,

, which results in an estimate asymptoticallyoptimal in the minimax sense (minimizing the maximumerror over all possible -sample signals). One other notablethreshold is the SURE threshold [15], derived from minimizingStein’s unbiased risk estimate [30] when soft-thresholdingis used. TheSureShrinkmethod is a hybrid of the universaland the SURE threshold, with the choice being dependenton the energy of the particular subband [15]. The SUREthreshold is data-driven, does not depend on explicitly,and SureShrinkestimates it in a subband-adaptive manner.Moreover, SureShrink has yielded good image denoisingperformance and comes close to the true minimum MSE of theoptimal soft-threshold estimator (cf. [4], [12]), and thus will bethe main comparison to our proposed method.

In the statistical Bayesian literature, many works have con-centrated on deriving the best threshold (or shrinkage factor)based on priors such as the Laplacian and a mixture of Gaus-sians (cf. [1], [8], [9], [18], [24], [27], [29], [32], [35]). With anintegral approximation to the pixel-wise MSE distortion mea-sure as discussed earlier, the formulation here is also Bayesianfor finding the best soft-thresholding rule under the general-ized Gaussian prior. A related work is [27] where the hard-thresholding rule is investigated for signals with Laplacian andGaussian distributions.

The GGD has been used in many subband or wavelet-basedimage processing applications [2], [20], [22], [23], [29], [34],[36]. In [29], it was observed that a GGD with the shape param-eter ranging from 0.5 to 1 [see (1)] can adequately describe thewavelet coefficients of a large set of natural images. Our expe-rience with images supports the same conclusion. Fig. 3 showsthe histogram of the wavelet coefficients of the images shown inFig. 9, against the generalized Gaussian curve, with the param-eters labeled (the estimation of the parameters will be explainedlater in the text.) A heuristic can be set forward to explain whythere are a large number of “small” coefficients but relativelyfew “large” coefficients as the GGD suggests: the small onescorrespond to smooth regions in a natural image and the largeones to edges or textures.

A. Adaptive Threshold for BayesShrink

The GGD, following [20], is

(5)

, , , where

and

Page 4: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1535

(a) (b)

(c) (d)

Fig. 3. Histogram of the wavelet coefficients of four test images. For each image, from top to bottom it is fine to coarse scales: from left to right, theyare theHH , HL, andLH subbands, respectively.

and is the gamma function. The pa-rameter is the standard deviation and is the shapepa-rameter. For a given set of parameters, the objective is to finda soft-threshold which minimizes theBayes risk,

(6)

where , and .Denote the optimal threshold by ,

(7)

which is a function of the parameters and . To our knowl-edge, there is no closed form solution for for this chosenprior, thus numerical calculation is used to find its value.1

Before examining the general case, it is insightful to considertwo special cases of the GGD: the Gaussian ( ) and the

1It was observed that for the numerical calculation, it is more robust to obtainthe value ofT from locating the zero-crossing of the derivative,r (T ), thanfrom minimizingr(T ) directly. In the Laplacian case, a recent work [17] derivesan analytical equation forT to satisfy and calculatesT by solving such anequation.

Laplacian ( ) distributions. The Laplacian case is particu-larly interesting, because it is analytically more tractable and isoften used in image processing applications.

Case 1: (Gaussian) with . It isstraightforward to verify that

(8)

(9)

where

(10)

with the standard normal density functionand the survival function of the

standard normal .

Page 5: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1536 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

(a)

(b)

Fig. 4. Thresholding for the Gaussian prior, with� = 1. (a) Compare theoptimal thresholdT (� ; 2) (solid —) and the thresholdT (� ) (dotted� � �) against the standard deviation� on the horizontal axis. (b) Compare therisk of using optimal soft-thresholding (—),T for soft-thresholding (� � �), andoptimal hard-thresholding (- - -).

Assuming for the time being, the value ofis found numerically for different values of and is plottedagainst in Fig. 4(a), with

(11)

superimposed on top. It is clear that this simple and closed-formexpression, , is very close to the numerically found

. The expected risks of and areshown in Fig. 4(b) for , where the maximum deviation of

is less than 1% of the optimal risk, .For general , it is an easy scaling exercise to see that (11)becomes

(12)

For a further comparison, the risk for hard-thresholding isalso calculated. After some algebra, it can be shown that therisk for hard-thresholding is

(13)

By setting to zero the derivative of (13) with respect to, theoptimal threshold is found to be

ifif

anything if(14)

with the associated risk

ifif

(15)

Fig. 4(b) shows that both the optimal and near-optimal soft-threshold estimators, and , achieve lower risksthan the optimal hard-threshold estimator.

The threshold is not only nearly optimal butalso has an intuitive appeal. The normalized threshold, ,is inversely proportional to , the standard deviation of , andproportional to , the noise standard deviation. When, the signal is much stronger than the noise, is chosen

to be small in order to preserve most of the signal and removesome of the noise; vice versa, when , the noise dom-inates and the normalized threshold is chosen to be large to re-move the noise which has overwhelmed the signal. Thus, thisthreshold choice adapts to both the signal and noise character-istics as reflected in the parametersand .

Case 2: (Laplacian) With , the GGD becomes Lapla-cian: LAP . Againfor the time being let . The optimal thresholdfound numerically is plotted against the standard deviationon the horizontal axis in Fig. 5(a). The curve of (insolid line —) is compared with (in dotted line

) in Fig. 5(a). Their corresponding expected risks are shownin Fig. 5(b), and the deviation of from the isless than 0.8%. This suggests that also works well inthe Laplacian case. For general, (12) holds again.

The threshold choice was found in-dependently in [27] for approximating the optimal hard-thresh-olding using the Laplacian prior. Fig. 5(a) compares the optimalhard-threshold, , and to the soft-thresholds

and . The corresponding risks are plottedin Fig. 5(b), which shows the soft-thresholding rule to yield alower risk for this chosen prior. In fact, for larger than ap-proximately 1.3 , the risk of the approximate hard-threshold isworse than if no thresholding were performed (which yields arisk of ).

When tends to infinity or the SNR is going to infinity,an asymptotic approximation of is derived in [17] inthis Laplacian case to be . However, in the same article,this asymptotic approximation is outperformed in seven test im-ages by the our proposed threshold, .

Page 6: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1537

(a)

(b)

Fig. 5. Thresholding for the Laplacian prior, with� = 1. (a) Compare theoptimal soft-thresholdT (� ; 1) (—), the BayesShrinkthresholdT (� )(� � �), the optimal hard-thresholdT (� ; 1) (- - -), and the thresholdT h (-�-�)against the standard deviation on the horizontal axis. (b) Their correspondingrisks.

With these insights from the special cases, the discussion nowreturns to the general case of GGD.

Case 3: (Generalized Gaussian) The proposed thresholdin (12) has been found to work well for the general

case. Let . In Fig. 6(a), each dotted line ( ) is the op-timal threshold for a given fixed , plotted against

on the horizontal axis. The values areshown. The threshold, , is plotted with thesolid line (—). The curve of the optimal threshold that liesclosest to is for , the Laplacian case,while other curves deviate from as moves away from 1.Fig. 6(b) shows the corresponding risks. The deviation betweenthe optimal risk and grows as moves away from1, but the error is still within 5% of the optimal for thecurves shown in Fig. 6(b). Because the thresholddepends

(a)

(b)

Fig. 6. Thresholding for the generalized Gaussian prior, with� = 1. (a)Compare the approximationT (� ) = � =� (—) with the optimalthresholdT (� ; �) for � = 0:6; 1; 2; 3; 4 (� � �). The horizontal axisis the standard deviation,� . (b) The optimal risks are in (� � �), and theapproximation in (—).

only on the standard deviation and not on the shape parameter,it may not yield a good approximation for values ofother thanthe range tested here, and the threshold may need to be modifiedto incorporate . However, since for the wavelet coefficientstypical values of falls in the range [0.5, 1], this simple form ofthe threshold is appropriate for our purpose. For a fixed set ofparameters, the curve of the risk (as a function of the threshold

) is very flat near the optimal threshold , implying that theerror is not sensitive to a slight perturbation near.

B. Parameter Estimation for Data-Driven Adaptive Threshold

This section focuses on the estimation of the GGD parame-ters, and , which in turn yields a data-driven estimate of

that is adaptive to different subband characteristics.

Page 7: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1538 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

The noise variance needs to be estimated first. In some sit-uations, it may be possible to measurebased on informationother than the corrupted image. If such is not the case, it is esti-mated from the subband by the robust median estimator,also used in [14], [15],

Mediansubband (16)

The parameter does not explicitly enter into the expressionof , only the signal standard deviation, , does. There-fore it suffices to estimate directly or .

Recall the observation model is , with andindependent of each other, hence

(17)

where is the variance of . Since is modeled as zero-mean, can be found empirically by

(18)

where is the size of the subband under consideration. Thus

(19)

where

(20)

In the case that , is taken to be 0. That is,is , or, in practice, , and all coefficientsare set to 0. This happens at times whenis large (for example,

for a grayscale image).To summarize, we refer to our method asBayesShrinkwhich

performs soft-thresholding, with the data-driven, subband-de-pendent threshold,

III. MDL P RINCIPLE FORCOMPRESSION-BASED DENOISING:THE MDLQ CRITERION

Recall our hypothesis is that compression achieves denoisingbecause the zero-zone in the quantization step (typical in com-pression methods) corresponds to thresholding in denoising. Forthe purpose of compression, after using the adaptive threshold

for the zero-zone, there still remains the questions ofhow to quantize the coefficients outside of the zero-zone andhow to code them. Fig. 7 illustrates the block diagram of thecompression method. It shows that the coder needs to decideon the design parameters (the number of quantization

Fig. 7. Schematic for compression-based denoising. Denoising is achieved inthe wavelet transform domain by lossy-compression, which involves the designof parametersT; m; and�, relating to the zero-zone width, the number ofquantization levels, and the quantization binwidth, respectively.

bins and the binwidth, respectively), in addition to the zero-zonethreshold . The choice of these parameters is discussed next.

When compressing a signal, two important objectives are tobe kept in mind. On the one hand, the distortion between thecompressed signal and the original should be kept low; on theother hand, the description of the compressed signal should useas few bits as possible to code. Typically, these two objectivesare conflicting, thus a suitable criterion is needed to reach a com-promise. Rissanen’s MDL principle allows a tradeoff betweenthese two objectives [26].

Let be a library or class of models from which the “best”one is chosen to represent the data. According to the MDL prin-ciple, given a sequence of observations, the “best” model is theone that yields the shortest description length for describing thedata using the model, where the description length can be in-terpreted as the number of bits needed for encoding. This de-scription can be accomplished by a two-part code: one part todescribe the model and the other the description of the data usingthe model.

More precisely, given the set of observations, we wish tofind a model to describe it. The MDL principle chooseswhich minimizes the two-part code-length,

(21)

where is the code-length for based on , andis the code-length for .

In Saito’s simultaneous compression and denoising method[28] for a length- one-dimensional signal, the hard-thresholdfunction was used to generate the models , wherethe number of nonzero coefficients to retain is determinedby minimizing the MDL criterion. The first term is theidealized code-length with the normal distribution [see (23)],and the second term is taken to be , ofwhich are the bits needed to indicate the location ofeach nonzero coefficient (assuming an uniform indexing) and(1/2)log for the value of each of the coefficients [see [26]for justification on using (1/2)log bits to store the coefficientvalue]. Although compression has been achieved in the sensethat a fewer number of nonzero coefficients are kept, [28] doesnot address the quantization step necessary in a practical com-pression setting.

In the following section, an MDL-based quantization crite-rion will be developed by minimizing with the restric-tion that belongs to the set of quantized signals.

Page 8: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1539

A. Derivation of the MDLQ Criterion

Consider a particular subband of size . Since the noisywavelet transform coefficients are , where areiid , then . Thus,

(22)

(23)

The second term in (23) is a constant, and thus is ignored in theminimization. Expression (23) appears also in [21], [28]. Theapproach described here differs from theirs in the estimate.

Let be the set of quantized coefficients , and be

constrained in . Plugging in as in (23) (with constantterms removed) gives

(24)

(25)

There are many possible ways to quantize and encode. Oneway is theuniform threshold quantizer(UTQ) with centroidreconstruction based on the generalized Gaussian distribution.The parameters of the GGD can be estimated from the observednoisy coefficients as described below, which is a variant of thatdescribed in [29].

For noiseless observations, is estimated as

(26)

and is solved from

(27)

where is the kurtosis of the GGD and is estimated as

The parameter values listed in Fig. 3 are estimated this way.When the image is corrupted by additive Gaussian noise, the

second and fourth moments have the following relations:

(28)

Fig. 8. Illustrating the quantizer.

The noise variance, , is estimated via (16). The second mo-ment, , and the kurtosis, , can be measured from the ob-servations . The parameter is estimated as in (20) and

is then solved from (28).Once the GGD parameters have been estimated, the quan-

tizer has sufficient information to perform the quantization.The quantizer, shown in Fig. 8, consists of levels of binsof equal size on each side, resulting in a total ofquantization bins (one zero-zone plussymmetric levels oneach positive and negative side). These bins are indexed as

. Consider the positive side andlet denote the boundaries of the quantizationbins, with centroid reconstruction values . Thevalue of with boundaries and is

(29)

Equation (29) is calculated using numerical integration (e.g. thetrapezoidal rule) since it does not have a closed-form solution.Note that , and during quantization, is taken to be .

The negative side is quantized in a symmetric way. The quan-tized coefficients are denoted by . Note that the zero co-efficients resulting from thresholding are kept as zeros, and thatthe subsequent quantization of the nonzero coefficients does notset any additional coefficients to zero. On average, the smallest

number of bits needed to code is the Shannon code. Thusthe code-length for coding the bin indices is

(30)

where is the number of coefficients in bin. The additionalparameters and need to be coded also, but it is supposedthat any positive values are equally likely, thus a fixed number ofbits are allocated for for each subband (8 bytes wereused in the experiment).

Page 9: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1540 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

Now we state the model selection criterion,MDLQ:

with (31)

To find the best model, (32) is minimized over and tofind the corresponding set of quantized coefficients. In the im-plementation, which is by no means optimized, this is done bysearching over , and for each , minimizing (32)over a reasonable range of[such as

]. Typically, once a minimum (in )has been reached, theMDLQ cost increase monotonically, thusthe search can terminate soon after a minimum has been de-tected.

This MDLQ compression withBayesShrinkzero-zone selec-tion is applied to each subband independently. The steps dis-cussed in this section are summarized as follows.

• Estimate the noise variance , and the GGD standarddeviation .

• Calculate the threshold , and soft-threshold the waveletcoefficients.

• To quantize the nonzero coefficients, minimize (32) overand to find the corresponding quantized coefficients

, which is the compressed, denoised estimate of.

The coarsest subband is quantized differently in that it isnot thresholded, and the quantization with (32) assumes the uni-form distribution. The coefficients are essentially local av-erages of the image, and are not characterized by distributionswith a peak at zero, thus the uniform distribution is used forgenerality. With the mean subtracted, the uniform distributionis assumed to be symmetric about zero. Every quantization bin(including the zero-zone) is of width , and the reconstructionvalues are the midpoints of the intervals.

TheMDLQ criterion in (32) has the additional interpretationof operating at a specified point on the rate-distortion (R-D)curve, as also pointed out by Liu and Moulin [21]. For a givencoder, one can obtain a set of operational rate-distortion points

. When there is a rate or distortion constraint, the con-straint problem can be formulated into a minimization problemwith a Lagrange multiplier, . In this case, (32) can beinterpreted as operating at . Natarajan [25]and Liu and Moulin [21] both proposed to use compressionfor denoising. The former operates at a constrained distortion,

, and the latter operates at on theR-D curve. Both works recommend the use of “any reasonablecoder” while our coder is designed specifically with the purposeof denoising.

IV. EXPERIMENTAL RESULTS AND DISCUSSION

The grayscale images “goldhill,” “ lena,” “ bar-bara” and “baboon” are used as test images with different noiselevels . The original images are shown in

Fig. 9. Original images. From top left, clockwise:goldhill, lena, barbaraandbaboon.

Fig. 9. The wavelet transform employs Daubechies’ least asym-metric compactly-supported wavelet with eight vanishing mo-ments [11] with four scales of orthogonal decomposition.

To assess the performance ofBayesShrink, it is comparedwith SureShrink[15]. The 1-D implementation ofSureShrinkcan be obtained from the WaveLab toolkit [3], and the 2-Dextension is straightforward. To gauge the best possible per-formance of a soft-threshold estimator, these methods are alsobenchmarked against what we callOracleShrink, which is thetruly optimal soft-thresholding estimator assuming the originalimage is known. The threshold ofOracleShrinkin each subbandis

(32)

with known. To further justify the choice of soft-thresh-olding over hard-thresholding, another benchmark,Ora-cleThresh, is also computed.OracleThreshis the best possibleperformance of a hard-threshold estimator, with subband-adap-tive thresholds, each of which is defined as

(33)

with known. The MSEs from the various methods are com-pared in Table I, and the data are collected from an averageof five runs. The columns refer to, respectively,OracleShrink,SureShrink, BayesShrink, BayesShrinkwith MDLQ-based com-pression,OracleThresh, Wiener filtering, and the bitrate (in bpp,or bits-per-pixel) of the MDLQ-compressed image. Since themain benchmark is againstSureShrink, the better one of theSureShrinkandBayesShrinkis highlighted in bold font for eachtest set. The MSEs resulting fromBayesShrinkcomes to within

Page 10: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1541

TABLE IFOR VARIOUS TEST IMAGES AND � VALUES, LISTS MSE OF (1) OracleShrink, (2) SureShrink, (3) BayesShrink, (4) BayesShrinkWITH MDLQ-COMPRESSION, (5) OracleThresh, AND (6) WIENER FILTERING THE LAST COLUMN SHOWS THE BITRATE (BITS PER

PIXEL) OF THE COMPRESSEDIMAGE OF (4). AVERAGED OVER FIVE RUNS

5% ofOracleShrinkfor the smoother imagesgoldhill andlena,and are most of the time within 6% for highly detailed imagessuch asbarbara andbaboon(though it may suffer up to 20%for small ). BayesShrinkoutperformsSureShrinkmost of thetime, up to approximately 8%. We observed in the experimentsthat using solely the SURE threshold yields excellent perfor-mance (sometimes yielding even lower MSE thanBayesShrinkby up to 1–2%). However, the hybrid method ofSureShrinkre-sults at times in the choice of the universal threshold which canbe too large. As illustrated in Table I, all three soft-thresholdingmethods outperforms significantly the best hard-thresholdingrule, OracleThresh.

It is not surprising that the SURE threshold and theBayesShrinkthreshold yield similar performances. The SUREthreshold can also be viewed as an approximate optimalsoft-threshold in terms of MSE. For a particular subband ofsize , following [15],

Sure

(34)where denotes min , and the SURE threshold is de-fined to be the value of minimizing Sure .

Recall

and s areiid . Conditioning on , by Stein’sresult,

Sure (35)

Moreover, as we have done before, if the distribution ofs isapproximated by a GGD, then the distribution ofs is approx-imated by the mixture distribution of GGD and ; or

while follows a GGD and is independent of.By the Law of Large Numbers,

Sure (36)

Taking expectation with respect to the GGD on both sides of(35), the risk can be written as

Sure

(37)

Page 11: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1542 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

Fig. 10. Comparing the performance of the various methods ongoldhill with � = 20. (a) Original. (b) Noisy image,� = 20. (c) OracleShrink. (d) SureShrink.(e) BayesShrink. (f) BayesShrinkfollowed by MDLQ compression.

Comparing (37) with (36), one can conclude thatSure is a data-based approximation to ,

and the SURE threshold, which minimizes Sure , is analternative toBayesShrinkfor minimizing the Bayesian risk.

We have also made comparisons with the Wiener filter, thebest linear filtering possible. The version used is the adaptivefilter, wiener2 , in the MATLAB image processing toolbox,using the default settings ( local window size, and the

unknown noise power is estimated). The MSE results areshown in Table I, and they are considerably worse than thenonlinear thresholding methods, especially whenis large. Theimage quality is also not as good as those of the thresholdingmethods.

The MDLQ-based compression step introduces quantizationnoise which is quite visible. As shown in the last column ofTable I, the coder achieves a lower bitrate, but at the expense

Page 12: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1543

of increasing the MSE. The MSE can be even worse than thenoisy observation for small values of, especially for the highlydetailed images. This is because the quantization noise is sig-nificant compared to the additive Gaussian noise. For larger,the compressed images can achieve noise reduction up to ap-proximately 75% in terms of MSE. Furthermore, the bitratesare significantly less than the original 8 bpp for grayscale im-ages. Thus, compression does achieve denoising and the pro-posed MDLQ-based compression can be used if simultaneousdenoising and compression is a desired feature. If only the bestdenoising performance were the goal, obviously using solelyBayesShrinkis preferred.

Note that the first-order entropy coding, , forthe bitrate of the quantized coefficients is a rather loose es-timate. With more sophisticated coding methods (e.g. predic-tive coding, pixel classification), the same bitrate could yield ahigher number of quantization level, thus resulting in a lowerMSE and enhancing the performance of the MDLQ-based com-pression-denoise.

A fair assessment of the MDLQ scheme for quantization afterthresholding is the R-D curve used in Hansen and Yu [17] (seehttp://cm.bell-labs.com/stat/binyu/publications.html). This R-Dcurve is calculated using noiseless coefficients, and yields thebest possible in terms of R-D tradeoff when the quantization isrestricted to equal-binwidth. It thus gives an idea on how ef-fective MDLQ is in choosing the tradeoff with respect to theoptimal. The closeness of the MDLQ point to this R-D lower-bound curve indicates that MDLQ chooses a good R-D tradeoffwithout the knowledge of the noiseless coefficients required inderiving this R-D curve.

Fig. 10 shows the resulting images of each denoising methodfor goldhill and (a zoomed-in section of the imageis displayed in order to show the details). Table II comparesthe threshold values for each subband chosen byOracleShrink,SureShrinkandBayesShrink, averaged over five runs. It is clearthat theBayesShrinkthreshold selection is comparable to theSURE threshold and to the true optimal threshold . Someof the unexpectedly large threshold values inSureShrinkcomesfrom the universal threshold, not the SURE threshold, and theseare placed in parentheses in the table. Table II(c) lists the thresh-olds of BayesShrink, and the thresholds in parentheses corre-spond to the case when , and all coefficientshave been set to zero. Table III tabulates the values ofchosenby for each subband of thegoldhill image, , av-eraged over five runs. TheMDLQcriterion allocates more levelsin the coarser, more important levels, as would be the case in apractical subband coding situation. A value of indicatesthat the coefficients have already been thresholded to zero, andthere is nothing to code.

The results forlena and are also shown. Fig. 11shows the same sequences for a zoomed-in portion oflena with noise . The corresponding results ofthreshold selections andMDLQ parameters forlena withnoise are listed in Tables IV and V. Interestedreaders can obtain a better view of the images at the website,http://www-wavelet.eecs.berkeley.edu/~grchang/compressDe-noise/.

TABLE IITHE THRESHOLDVALUES OF OracleShrink, SureShrink, AND BayesShrink,

RESPECTIVELY, (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENT

SUBBANDS OF GOLDHILL , WITH NOISE STRENGTH� = 20

TABLE IIITHE VALUE OF m (AVERAGED OVER FIVE RUNS) FOR THEDIFFERENT

SUBBANDS OF GOLDHILL , WITH NOISE STRENGTH� = 20

V. CONCLUSION

Two main issues regarding image denoising were addressedin this paper. Firstly, an adaptive threshold for wavelet thresh-olding images was proposed, based on the GGD modeling ofsubband coefficients, and test results showed excellent perfor-mance. Secondly, a coder was designed specifically for simul-taneous compression and denoising. The proposedBayesShrinkthreshold specifies the zero-zone of the quantization step of thiscoder, and this zero-zone is the main agent in the coder whichremoves the noise. Although the setting in this paper was in the

Page 13: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1544 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

Fig. 11. Comparing the performance of the various methods onlenawith � = 10. (a) Original. (b) Noisy image,� = 10. (c) OracleShrink. (d) SureShrink.(e) BayesShrink. (f) BayesShrinkfollowed by MDLQ compression.

wavelet domain, the idea can be extended to other transform do-mains such as DCT, which also relies on the energy compactionand sparse representation properties to achieve good compres-sion.

There are several interesting directions worth pursuing. Thecurrent compression selects the threshold (i.e. zero-zone size)

and the quantization bin size in a two-stage process. In

typical image coders, however, the zero-zone is chosen to beeither the same size or twice the size as other bins. Thus it wouldbe interesting to jointly select these two values and analyze theirdependencies on each other. Furthermore, a more sophisticatedcoder is likely to produce better compressed images than thecurrent scheme, which uses the first order entropy to code thebin indices. With an improved coder, an increase in the number

Page 14: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1545

TABLE IVTHE THRESHOLDVALUES OF OracleShrink, SureShrink, AND BayesShrink,

RESPECTIVELY, (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENT

SUBBANDS OF LENA, WITH NOISE STRENGTH� = 10

TABLE VTHE VALUE OF m (AVERAGED OVER FIVE RUNS) FOR THEDIFFERENT

SUBBANDS OF LENA, WITH NOISE STRENGTH� = 10

of quantization bins would not increase the bitrate penalty bymuch, and thus the coefficients would be quantized at a finerresolution than the current method. Lastly, the model familycould be expanded. For example, one could use a collection ofwavelet bases for the wavelet decomposition, rather than usingjust one chosen wavelet, to allow possibly better representationsof the signals.

In our other work [7], it was demonstrated thatspatiallyadap-tive thresholds greatly improves the denoising performance overuniform thresholds. That is, the threshold value changes foreachcoefficient. The threshold selection uses the context-modeling

idea prevelant in coding methods, thus it would be interestingto extend this spatially adaptive threshold to the compressionframework, without incuring too much overhead. This wouldlikely improve the denoising performance.

REFERENCES

[1] F. Abramovich, T. Sapatinas, and B. W. Silverman, “Wavelet thresh-olding via a Bayesian approach,”J. R. Statist. Soc., ser. B, vol. 60, pp.725–749, 1998.

[2] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image codingusing wavelet transform,”IEEE Trans. Image Processing, vol. 1, no. 2,pp. 205–220, 1992.

[3] J. Buckheit, S. Chen, D. Donoho, I. Johnstone, and J. Scargle, “WaveLabToolkit,”, http://www-stat.stanford.edu:80/~wavelab/.

[4] A. Chambolle, R. A. DeVore, N. Lee, and B. J. Lucier, “Nonlinearwavelet image processing: Variational problems, compression, andnoise removal through wavelet shrinkage,”IEEE Trans. Image Pro-cessing, vol. 7, pp. 319–335, 1998.

[5] S. G. Chang, B. Yu, and M. Vetterli, “Bridging compression to waveletthresholding as a denoising method,” inProc. Conf. Information Sci-ences Systems, Baltimore, MD, Mar. 1997, pp. 568–573.

[6] , “Image denoising via lossy compression and wavelet thresh-olding,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, SantaBarbara, CA, Nov. 1997, pp. 604–607.

[7] , “Spatially adaptive wavelet thresholding with context modelingfor image denoising,”IEEE Trans. Image Processing, vol. 9, pp.1522–1531, Sept. 2000.

[8] H. Chipman, E. Kolaczyk, and R. McCulloch, “Adaptive bayesianwavelet shrinkage,”J. Amer. Statist. Assoc., vol. 92, no. 440, pp.1413–1421, 1997.

[9] M. Clyde, G. Parmigiani, and B. Vidakovic, “Multiple shrinkage andsubset selection in wavelets,”Biometrika, vol. 85, pp. 391–402, 1998.

[10] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based sta-tistical signal processing using hidden Markov models,”IEEE Trans.Signal Processing, vol. 46, pp. 886–902, Apr. 1998.

[11] I. Daubechies,Ten Lectures on Wavelets, Vol. 61 of Proc. CBMS-NSF Re-gional Conference Series in Applied Mathematics. Philadelphia, PA:SIAM, 1992.

[12] R. A. DeVore and B. J. Lucier, “Fast wavelet techniques for near-optimalimage processing,” inIEEE Military Communications Conf. Rec.SanDiego, Oct. 11–14, 1992, vol. 3, pp. 1129–1135.

[13] D. L. Donoho, “De-noising by soft-thresholding,”IEEE Trans. Inform.Theory, vol. 41, pp. 613–627, May 1995.

[14] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation via waveletshrinkage,”Biometrika, vol. 81, pp. 425–455, 1994.

[15] , “Adapting to unknown smoothness via wavelet shrinkage,”Journal of the American Statistical Assoc., vol. 90, no. 432, pp.1200–1224, December 1995.

[16] , “Wavelet shrinkage: Asymptopia?,”J. R. Stat. Soc. B, ser. B, vol.57, no. 2, pp. 301–369, 1995.

[17] M. Hansen and B. Yu, “Wavelet thresholding via MDL: Simultaneousdenoising and compression,” , 1999, submitted for publication.

[18] M. Jansen, M. Malfait, and A. Bultheel, “Generalized cross validationfor wavelet thresholding,”Signal Process., vol. 56, pp. 33–44, Jan. 1997.

[19] I. M. Johnstone and B. W. Silverman, “Wavelet threshold estimators fordata with correlated noise,”J. R. Statist. Soc., vol. 59, 1997.

[20] R. L. Joshi, V. J. Crump, and T. R. Fisher, “Image subband coding usingarithmetic and trellis coded quantization,”IEEE Trans. Circuits Syst.Video Technol., vol. 5, pp. 515–523, Dec. 1995.

[21] J. Liu and P. Moulin, “Complexity-regularized image denoising,”Proc.IEEE Int. Conf. Image Processing, vol. 2, pp. 370–373, Oct. 1997.

[22] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, “Image codingbased on mixture modeling of wavelet coefficients and a fast estimation-quantization framework,” inProc. Data Compression Conf., Snowbird,UT, Mar. 1997, pp. 221–230.

[23] S. Mallat, “A theory for multiresolution signal decomposition: Thewavelet representation,”IEEE Trans. Pattern Anal. Machine Intell.,vol. 11, pp. 674–693, July 1989.

[24] G. Nason, “Choice of the threshold parameter in wavelet function es-timation,” in Wavelets in Statistics, A. Antoniades and G. Oppenheim,Eds. Berlin, Germany: Springer-Verlag, 1995.

[25] B. K. Natarajan, “Filtering random noise from deterministic signalsvia data compression,”IEEE Trans. Signal Processing, vol. 43, pp.2595–2605, Nov. 1995.

Page 15: Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

1546 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

[26] J. Rissanen,Stochastic Complexity in Statistical Inquiry. Singapore:World Scientific, 1989.

[27] F. Ruggeri and B. Vidakovic, “A Bayesian decision theoretic approachto wavelet thresholding,”Statist. Sinica, vol. 9, no. 1, pp. 183–197, 1999.

[28] N. Saito, “Simultaneous noise suppression and signal compression usinga library of orthonormal bases and the minimum description length cri-terion,” inWavelets in Geophysics, E. Foufoula-Georgiou and P. Kumar,Eds. New York: Academic, 1994, pp. 299–324.

[29] E. Simoncelli and E. Adelson, “Noise removal via Bayesian waveletcoring,” Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 379–382,Sept. 1996.

[30] C. M. Stein, “Estimation of the mean of a multivariate normal distribu-tion,” Ann. Statist., vol. 9, no. 6, pp. 1135–1151, 1981.

[31] M. Vetterli and J. Kovacevic, Wavelets and SubbandCoding. Englewood Cliffs, NJ: Prentice-Hall, 1995.

[32] B. Vidakovic, “Nonlinear wavelet shrinkage with Bayes rules and Bayesfactors,”J. Amer. Statist. Assoc., vol. 93, no. 441, pp. 173–179, 1998.

[33] Y. Wang, “Function estimation via wavelet shrinkage for long-memorydata,”Ann. Statist., vol. 24, pp. 466–484, 1996.

[34] P. H. Westerink, J. Biemond, and D. E. Boekee, “An optimal bit alloca-tion algorithm for sub-band coding,” inProc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, Dallas, TX, Apr. 1987, pp. 1378–1381.

[35] N. Weyrich and G. T. Warhola, “De-noising using wavelets and cross-validation,” Dept. of Mathematics and Statistics, Air Force Inst. of Tech.,AFIT/ENC, OH, Tech. Rep. AFIT/EN/TR/94-01, 1994.

[36] Y. Yoo, A. Ortega, and B. Yu, “Image subband coding using context-based classification and adaptive quantization,”IEEE Trans. Image Pro-cessing, vol. 8, pp. 1702–1715, Dec. 1999.

S. Grace Chang(S’95) received the B.S. degreefrom the Massachusetts Institute of Technology,Cambridge, in 1993 and the M.S. and Ph.D. degreesfrom the University of California, Berkeley, in 1995and 1998, respectively, all in electrical engineering.

She was with Hewlett-Packard Laboratories, PaloAlto, CA, and is now with Hewlett-Packard Co.,Grenoble, France. Her research interests includeimage enhancement and compression, Internet appli-cations and content delivery, and telecommunicationsystems.

Dr. Chang was a recipient of the National Science Foundation Graduate Fel-lowship and a University of California Dissertation Fellowship.

Bin Yu (A’92–SM’97) received the B.S. degree inmathematics from Peking University, China, in 1984,and the M.S. and Ph.D. degrees in statistics from theUniversity of California, Berkeley, in 1987 and 1990,respectively.

She is an Associate Professor of statistics withthe University of California, Berkeley. Her researchinterests include statistical inference, informationtheory, signal compression and denoising, bioin-formatics, and remote sensing. She has publishedover 30 technical papers in journals such as IEEE

TRANSACTIONS ON INFORMATION THEORY, IEEE TRANSACTIONS ON IMAGE

PROCESSING, IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING,The Annals of Statistics, Annals of Probability, Journal of American StatisticalAssociation,and Genomics.She has held faculty positions at the Universityof Wisconsin, Madison, and Yale University, New Haven, CT, and was aPostdoctoral Fellow with the Mathematical Science Research Institute (MSRI),University of California, Berkeley.

Dr. Yu was in the S. S. Chern Mathematics Exchange Program between Chinaand the U.S. in 1985. She is a Fellow of the Institute of Mathematical Statistics(IMS), and a member of The American Statistical Association (ASA). She isserving on the Board of Governors of the IEEE Information Theory Society,and as an Associate Editor forThe Annals of Statisticsand forStatistica Sinica.

Martin Vetterli (S’86–M’86–SM’90–F’95) re-ceived the Dipl. El.-Ing. degree from ETH Zürich(ETHZ), Zürich, Switzerland, in 1981, the M.S.degree from Stanford University, Stanford, CA, in1982, and the Dr.Sci. degree from EPF Lausanne(EPFL), Lausanne, Switzerland, in 1986.

He was a Research Assistant at Stanford Univer-sity and EPFL, and was with Siemens and AT&TBell Laboratories. In 1986, he joined ColumbiaUniversity, New York, where he was an AssociateProfessor of electrical engineering and Co-Director

of the Image and Advanced Television Laboratory. In 1993, he joined theUniversity of California, Berkeley, where he was a Professor in the Departmentof Electrical Engineering and Computer Sciences until 1997, and holds nowAdjunct Professor position. Since 1995, he has been a Professor of communica-tion systems at EPFL, where he chaired the Communications Systems Division(1996–1997), and heads the Audio-Visual Communications Laboratory. Heheld visiting positions at ETHZ in 1990 and Stanford University in 1998.He is on the editorial boards ofAnnals of Telecommunications, Applied andComputational Harmonic Analysis, and theJournal of Fourier Analysis andApplications. He is the co-author, with J. Kovacevic, of the bookWaveletsand Subband Coding(Englewood Cliffs, NJ: Prentice-Hall, 1995). He haspublished about 75 journal papers on a variety of topics in signal and imageprocessing and holds five patents. His research interests include wavelets,multirate signal processing, computational complexity, signal processing fortelecommunications, digital video processing, and compression and wirelessvideo communications.

Dr. Vetterli is a member of SIAM and was the Area Editor for Speech,Image, Video, and Signal Processing for the IEEE TRANSACTIONS ON

COMMUNICATIONS. He received the Best Paper Award of EURASIP in 1984for his paper on multidimensional subband coding, the Research Prize of theBrown Bovery Corporation, Switzerland, in 1986 for his doctoral thesis, theIEEE Signal Processing Society’s Senior Award in 1991 and 1996 (for paperswith D. LeGall and K. Ramchandran, respectively). He was a IEEE SignalProcessing Distinguished Lecturer in 1999. He received the Swiss NationalLatsis Prize in 1996 and the SPIE Presidential award in 1999. He has been aPlenary Speaker at various conferences including the 1992 IEEE InternationalConference on Acoustics, Speech, and Signal Processing.