pan-sharpening with a bayesian nonparametric dictionary learning...

Pan-Sharpening with a Bayesian Nonparametric Dictionary Learning Model

Xinghao Ding Yiyong Jiang Yue Huang John PaisleyDepartment of Communications Engineering

Xiamen University, Fujian, ChinaDepartment of Electrical EngineeringColumbia University, New York, USA

Abstract

Pan-sharpening, a method for constructing highresolution images from low resolution obser-vations, has recently been explored from theperspective of compressed sensing and sparserepresentation theory. We present a new pan-sharpening algorithm that uses a Bayesian non-parametric dictionary learning model to give anunderlying sparse representation for image re-construction. In contrast to existing dictionarylearning methods, the proposed method infersparameters such as dictionary size, patch spar-sity and noise variances. In addition, our reg-ularization includes image constraints such as atotal variation penalization term and a new gra-dient penalization on the reconstructed PAN im-age. Our method does not require high resolutionmultiband images for dictionary learning, whichare unavailable in practice, but rather the dictio-nary is learned directly on the reconstructed im-age as part of the inversion process. We presentexperiments on several images to validate ourmethod and compare with several other well-known approaches.

1 Introduction

Remote sensing technology has developed rapidly in recentdecades. There are currently a variety of optical remotesensors, such as IKONOS, Pleiades and QuickBird, col-lecting satellite imagery with varying spatial and spectralresolutions. Due to physical constraints, many of these re-mote sensors do not collect images that are simultaneouslyof high spatial and spectral resolution [11]. As a result,data provided by most satellites comprise a high resolu-tion panchromatic (PAN), i.e., gray-scale image, and sev-eral low resolution multispectral (LRMS) images, which

Appearing in Proceedings of the 17th International Conference onArtificial Intelligence and Statistics (AISTATS) 2014, Reykjavik,Iceland. JMLR: W&CP volume 33. Copyright 2014 by the au-thors.

include the color spectrum and other frequencies such asnear infrared. In many remote sensing applications, such asclassification, feature extraction and positioning, the corre-sponding high resolution multispectral (HRMS) images aredesired. Pan-sharpening is a method whereby the spectralinformation in the LRMS images is fused with the spatialinformation in the PAN image to output an approximationof the underlying HRMS images [3].

Various methods have been proposed for pan-sharpening.The most common are based on a projection-substitutionapproach, which assumes that the PAN image is equiv-alent to a linear combination of the HRMS images [12].Among the projection-substitution methods, intensity-hue-saturation (IHS) [19, 1, 16], principal component analysis(PCA) [17], and the Brovey transform [7] are the most pop-ular because of their relatively straightforward implemen-tation and fast computation. These three methods recon-struct HRMS images with good spatial details, but tend tohave significant spectral distortions because the PAN andHRMS images do not share the same spectral range.

Recently, another approach has been to treat the fusionproblem as an ill-posed inverse problem and employ a reg-ularization scheme that encourages a reconstruction havingthe desired image characteristics. One example is moti-vated by compressed sensing (CS), which uses sparse reg-ularization. The CS-based methods in [12, 5] are successfulattempts at pan-sharpening in which LRMS image patchesare assumed to have a sparse representation with respect toa patch-level dictionary. In these methods, a dictionary israndomly sampled (not learned) from either pre-obtainedHRMS images or the PAN image itself. A limitation ofthese approaches is that important parameters such as thedictionary size and patch-specific sparsity are preset ratherthan learned from the data, and also that it leaves the dic-tionary quality to chance. Other dictionary learning algo-rithms use HRMS images to learn the dictionary off-line,but these are often unavailable in practice.

Motivated by the sparse representation perspective, in thispaper we propose a pan-sharpening method that uses abeta-Bernoulli process as a Bayesian nonparametric priorfor dictionary learning [18, 15]. Compared with the pre-vious CS-based pan-sharpening methods, the proposed


model has the following unique properties:

1. It uses a Bayesian nonparametric dictionary learningapproach [15, 21] to learn the dictionary directly onthe reconstructd HRMS images of the current itera-tion for the problem at hand, as opposed to imagesobtained off-line. The model infers a sparse represen-tation by learning the number of dictionary elementsand the sparse coding of each patch.

2. We consider image properties not considered by otherpan-sharpening approaches. We include the total vari-ation (TV) as a penalty term for the pan-sharpenedimages and use the alternating direction method ofmultipliers (ADMM) to derive an efficient optimiza-tion procedure [4, 8]. We also consider a first deriva-tive penalty for the PAN/HRMS fidelity term, whichreduces the spectral distortion of the reconstructedHRMS images.

2 Pan-sharpening and dictionary learning

Images obtained by optical remote sensors contain both ahigh resolution panchromatic (i.e., black and white) image(PAN) and low resolution multispectral images (LRMS).The spectral bands typically consist of the colors red, greenand blue, as well as a near infrared band. The goal is to fusethe color information of the LRMS images with the spatialinformation of the PAN image to obtain a high resolutionmultispectral image (HRMS).

In this paper, for concreteness we consider the case wherethe LRMS images are downsampled by a factor of four. Werepresent each of the M ×N LRMS images in vectorizedform as yb ∈ RMN , where b indexes a spectral band (RGBand NIR). This observed data is treated as a noisy versionof the corresponding 4M × 4N HRMS images, which arevectorized as xb ∈ R16MN . We follow the standard ap-proach of assuming that the measured yb follows from fil-tering xb through a predefined and shared subsampling andblurring matrix H ∈ RMN×16MN and adding noise,

yb = Hxb + εb. (1)

The monochromatic PAN image is usually considered to bea linear combination of the unknown HRMS images,

yP =∑4b=1 wbxb + εP . (2)

The vector yP ∈ R16MN represents the vectorized 4M ×4N PAN image, εP is a Gaussian noise vector and theweights (w1, . . . , w4) are predefined.

The pan-sharpening problem is to use the measureddata yP , y1, y2, y3, y4, along with settings for H and(w1, . . . , w4) to recover the underlying HRMS imagesx1, x2, x3, x4 via some predefined inversion model. For

Algorithm 1 Dictionary Learning with BPFA1. Construct a dictionary D = [d1, . . . , dk], withdk ∼ N (0, L−1IL) for each k.

2. Draw probability πk for each dk,πk ∼ Beta(cγ/K, c(1− γ/K)).

3. For the ith patch in X:(a) Draw the weight vector si ∼ N (0, γ−1s IK).(b) Draw the binary vector zik ∼ Bernoulli(πk).(c) Define αi = si ◦ zi by an element-wise product.(d) Construct the patch RiX = Dαi + εi with noise

εi ∼ N (0, γ−1ε IL).

4. Construct the image using the average of all RiX thatoverlap on a given pixel.

this inversion, we propose using a Bayesian nonparametric(BNP) dictionary learning model based on the beta processcalled BPFA [15].

2.1 BNP dictionary learning for pan-sharpening

Sparse dictionary learning methods attempt to find a sparsebasis for patches extracted from an image class of interest.The canonical dictionary learning approach is with the K-SVD model [2], which iterates between orthogonal match-ing pursuits and least squares to learn a matrix factoriza-tion of patches extracted from images. In this paper weinstead focus on a BNP dictionary learning model. Thismodel performs dictionary learning in a similar manneras K-SVD, in that it iterates between learning a weightedbinary coefficient matrix and a regularized least squaresdictionary. However in addition, Bayesian nonparametricmethods provide the flexibility of allowing the data to de-termine the complexity of the model [9].

We learn a dictionary using the reconstructed HRMS im-ages of the current iteration. Therefore, the “data” isslightly changing with each iteration. We extract

√q ×√q

patches from these reconstructed HRMS images, one cen-tered on each pixel, and put them in a matrix X . Let Ribe the ith patch extraction operator, which is a q × 16MNmatrix containing a one in each row and zeros elsewhere.The ith column in the 4q × 16MN matrix X is

Xi = [Rix1;Rix2;Rix3;Rix4] ∈ R4q.

A dictionary learning model learns a factorization of thismatrix, X = Dα+E, where D is a 4q ×K dictionary, αis a K × 16MN coefficient matrix and E is noise.

We use the BPFA model shown in Algorithm 1 for dictio-nary learning from X . This model treats the columns ofD as Gaussian vectors and α as being a weighted binarymatrix, S ◦ Z. The matrix S contains Gaussian randomvariables, and a row of Z is drawn from a Bernoulli pro-cess using a row-specific probability. This probability has a

Xinghao Ding Yiyong Jiang Yue Huang John Paisley

beta distribution prior that is nonparametric through its pa-rameterization; the parameters are set such that, for a largeK, many of the probabilities will be so small that an entirerow of Z will contain all zeros [18]. In this case the cor-responding dictionary element is removed from the model.Therefore, unlike K-SVD which will learn a full dictionaryof size K, the BPFA model will learn a dictionary that issmaller than the initial setting of K, and is in some sense“appropriate” for the data at hand because the actual size ofthe dictionary is determined by the data. As K → ∞, thisapproximation converges to the beta process [10].

This model allows for learning an overcomplete dictionarywith the size of the dictionary not being preset (assumingK is large enough). Additionally, conjugate gamma priordistributions can be placed on the noise precisions γs andγε, which allows for these values to be inferred.

2.2 Total variation and high frequency constraints

We discuss two other image constraints that we considerfor our reconstruction algorithm before presenting the finalobjective function and inference algorithm.

The first constraint is the total variation (TV) of the recon-structed HRMS images. TV regularization was introducedfor image denoising by Rudin, Osher and Fatemi in [13],and has since evolved into a more general tool for solvinga wide variety of image restoration problems, including de-convolution and inpainting. The isotropic TV norm of avectorized image x is defined by

‖x‖TV =∑i ‖Bix‖2 =

∑i

√|Bhi x|2 + |Bvi x|2. (3)

The vectors Bhi and Bvi are the rows of Bi ∈ R2×16MN ,which is a finite difference matrix that has two nonzero en-tries in each row corresponding to the partial derivatives ofx at the pixel i along the horizontal and vertical directions.An efficient algorithm was explored by [8].

Another aspect of pan-sharpening algorithms is to use thePAN image for its high resolution spatial information. Mo-tivated by Equation (2), most models do this by incorpo-rating a squared error term for yP ≈

∑b wbxb [3]. A

drawback of this approach is that the weights wb need sig-nificant tuning to minimize the spectral distortion that thistends to introduce in the learned xb. In fact, the informa-tion we wish to exploit in the PAN image yP is the highfrequency edge details, while the low frequency (color) in-formation is provided by the LRMS y1, . . . , y4. Therefore,we propose using the squared error of the horizontal andvertical derivatives of the residual instead. This will en-force that

∑b wbxb agrees with yP in the edge details only

and allow for the spectral information in y1, . . . , y4 to in-form the smooth regions of the corresponding HRMS im-ages x1, . . . , x4 that we learn.

3 Bayesian nonparametric pan-sharpening

We combine the three elements of the model from Section2 in the following algorithm for pan-sharpening.

3.1 A BNP pan-sharpening model

Let the set ϕi = {D, si, zi, γε, γs, π} denote the param-eters for the BPFA dictionary learning model and f(ϕi)be the log joint likelihood of the BPFA algorithm for allprior terms. We define the gradient matrix Gi, which is a2× 16MN matrix of zeros except for a +1 and−1 in eachrow for the horizontal and vertical directions as defined bypixel i. The objective we seek to maximize is:

L(x1, x2, x3, x4,ϕ) = (4)

v12

∑b

‖Hxb − yb‖22 +v22

∑i

‖Gi(∑b wbxb − yP )‖22

+∑i

γε2‖RiX −Dαi‖22 + f(ϕi) +

λg2

∑b

‖xb‖TV .

The first row contains the squared error fidelity term forthe multispectral information of each band b; this enforcesthat the downsampling/blurring of our reconstruction xb isin agreement with the observed yb. The second term inthe first row enforces that the gradient of the approximatePAN image from the reconstructions is in agreement withthe gradient of the observed PAN image; this term is wherethe high resolution edge information from the PAN imageis enforced. The second row contains the beta process prioron the stacked patches from each xb and a TV penalty term.This attempts to learn HRMS reconstructions that have asparse dictionary representation and little noise. The goalis to find a (local) optimal solution for L with respect tothe HRMS images (x1, . . . , x4) and the dictionary learningparameters ϕ.

3.2 An optimization algorithm

We next summarize our algorithm for approximately op-timizing the non-convex objective function L. Our algo-rithm incorporates regularized least squares, MCMC sam-pling and the alternating direction method of multipliers(ADMM) algorithm for TV minimization. ADMM is ageneral algorithmic approach to convex optimization [4].For our model, ADMM works by performing dual as-cent on the augmented Lagrangian objective function in-troduced for the total variation coefficients for each spectralband. Though our overall objective is not convex becauseof the dictionary learning terms, we note that when holdingthe dictionary learning parameters fixed, the resulting TVdenoising problem for which we use ADMM is convex.

The ADMM algorithm entails modifying the objectivefunction as follows: For the bth spectral band, we re-write


the TV coefficients for the ith pixel as βi,b := Bixb. Weintroduce the vector of Lagrange multipliers ηi,b, and thensplit βi,b from Bixb by relaxing the inequality via an aug-mented Lagrangian. This results in the following modifica-tion to the TV portion of the objective:

‖xb‖TV →λg2

∑i

‖βi,b‖2 +∑i

ηTi,b(Bixb − βi,b)

+ρ

2

∑i

‖Bixb − βi,b‖22.

From the ADMM theory, this objective will have optimalvalues βi,b∗ and x∗b with βi,b∗ = Bix

∗b , and so the equality

constraints will be satisfied. Following the standard conve-nient re-parameterization, we let ui,b := (1/ρ)ηi,b to ab-sorb ui,b within the squared term [4, 8].

The resulting augmented objective function can be splitinto three separate sub-problems that are cycled through ineach iteration. One problem is for TV, one for BPFA andone for reconstructing the HRMS:

P1 : βi,b′ = arg min

β

λg2‖βi,b‖2+

ρ

2‖Bixb−βi,b+ui,b‖22.

P2 : ϕ′ = arg minϕ

∑i

γε2‖RiX−Dαi‖22+f(ϕi).

P3 : x′b = arg minxb

ρ

2‖Bixb−βi,b′+ui,b‖22

+∑i

γε2‖Rixb −D′bα′i‖22

+v12‖Hxb − yb‖22 +

v22

∑i

‖Gi(∑b wbxb − YP )‖22.

u′i,b = ui,b +Bix′b − βi,b′, i = 1, . . . , 16MN. (5)

We let Db denote the part of the dictionary relevant to bandb. (We recal that we combine the spectral bands to learn adictionary on vectorized

√q×√q× 4 blocks.) The update

of the Lagrange multiplier ui,b follows from the ADMMalgorithm [4].

For each sub-problem, we use the most recent values of allother parameters. Solutions for P1 and P3 are globally op-timal and in closed form. Since P2 is non-convex, we can-not perform the desired minimization, and so an approxi-mation is required. Furthermore, this problem requires it-erating through several parameters, and so a local optimalsolution cannot be given in closed form either. Our ap-proach is to update variables for P2 by a combination ofMCMC sampling and least squares. We next present theupdates for these sub-problems.

3.2.1 P1 sub-problem: Total variation

We can solve for βi,b exactly for each pixel i =1, . . . , 16MN by using a generalized shrinkage operation,

βi,b′

= max{‖Bixb + ui,b‖2 −λgρ, 0} · Bixb + ui,b

‖Bixb + ui,b‖2.

We recall that after updating xb, we update the Lagrangemultiplier for this part as u

′

i,b = ui,b +Bix′

b − βi,b′.

3.2.2 P2 sub-problem: BNP dictionary learning

We give the variable updates for BPFA in the appendix. Wemake one update for each variable in this step.

3.2.3 P3 sub-problem: HRMS image reconstructions

In this sub-problem we reconstruct each HRMS spectralband xb for b = 1, . . . , 4. This is a least squares problemand has a closed form solution, but requires a special ap-proach to make it computationally tractable.

For notational convenience, we define the spatial dif-ferences matrices B = [BT1 , . . . , B

T16MN ]T and G =

[GT1 , . . . , GT16MN ]T . We also defined the stacked

vectors βb = [β1,bT , . . . ,β16MN,b

T ]T and ub =[uT1,b, . . . , u

T16MN,b]. For band b, the least squares problem

for xb has a solution that satisfies(ρBTB +

∑iR

Ti Ri + v1H

TH + v2w2bG

TG)xb

= ρBT (βb − ub) + qxBPFAb + v1H

T yb

+ v2wbGTG(yP −

∑m6=b wmxm). (6)

The vector xBPFAb = (1/q)

∑iR

Ti Dbαi is the reconstructed

HRMS image for band b according to the BPFA dictionarylearning model. We need to solve for xb to get the recon-struction.

The left matrix is too large to invert. However, we can solvethis problem exactly by working within the Fourier domain.Let θb = Fxb be the Fourier transform of xb. We replacexb withFT θb and take the Fourier transform of each side ofEquation (6). This diagonalizes the matrix on the left handside, which can be seen as follows: The product of the finitedifference operator matrix BTB yields a circulant matrix,which has the rows of the Fourier matrix F as its eigen-vectors. Therefore FBTBFH diagonalizesBTB to give amatrix of eigenvalues Λ1. The matrix GTG is the same asBTB since it is also a spatial finite difference matrix, andso has the same eigenvalues. HTH is a different circulantmatrix and so has unique eigenvalues Λ2, though the sameFourier basis as eigenvectors. The matrix

∑iRiRi = qI

and so the product of Fourier matrices cancel.


Table 1: Quality metrics of different methods on IKONOSimages from Figure 1.

RMSE CC WB ERGAS Q4

Brovey 0.120 0.851 0.599 10.07 0.776Wavelet 0.079 0.898 0.618 6.76 0.760

PCA 0.140 0.601 0.350 12.08 0.600FIHS 0.156 0.577 0.429 13.99 0.537

Adaptive IHS 0.077 0.916 0.666 6.39 0.761BPFA 0.074 0.921 0.697 6.13 0.794

BPFA+TV 0.071 0.923 0.686 5.95 0.814BPFA w/o Gi 0.079 0.917 0.676 6.22 0.778

This reduces the problem to solving 16MN one dimen-sional problems in the Fourier domain for each band:

θbi = Fi(ρBT (βb − ub) + qxBPFA

b + v1HT yb

+ v2wbGTG(yP −

∑m6=b wmxm)

)/

(q + (ρ+ v2w2b )Λ1,i + v1Λ2,i). (7)

We then invert θb using the inverse Fourier transform toobtain the reconstruction xb for spectral band b.

4 Experiments

In this section, we evaluate our proposed models, denotedBPFA+TV and BPFA (without TV), on three remote sens-ing images. The data sets are as follows:

1. IKONOS. This dataset consists of a 512 × 512 PANimage and low resolution multispectral images of di-mension 128 × 128 pixels each. The MS images havefour bands, RGB and near infrared (NIR).

2. Pleiades. The data set from the Pleiades satellite con-tains a 256 × 256 PAN image and four LRMS imagesof 64 × 64 pixels each.

3. QuickBird. In this experiment we have a 512 × 512PAN image and four 128 × 128 LRMS images.

For all experiments we extract 4 × 4 × 4 patches from thereconstructed HRMS images of the current iteration, fora dictionary of 64 dimensions. We initialize the dictionarysize to 256. We also setw1 = 0.25, w2 = 0.25, w3 = 0.25,w4 = 0.25 and λg = 0.1. The BPFA hyperparametersc, γ, e0, f0, g0, h0 are all set to one.

We compare the performance with other existing fusionmethods, namely, model-based fusion [1], Brovey trans-form [7], the fast IHS (FIHS) [19], a PCA-based method[17], wavelet-based image fusion [14] and the adaptive IHSmethod [16]. Since there are no ground truth HRMS im-ages for IKONOS and Pleiades, to compare these algo-rithms we re-project the fused images and compare with

Table 2: Quality metrics of different methods on Pleiadesimages from Figure 2.

RMSE CC WB ERGAS Q4

Brovey 0.274 0.791 0.235 65.522 0.252Wavelet 0.110 0.877 0.754 8.304 0.794

PCA 0.144 0.830 0.537 11.056 0.635FIHS 0.133 0.790 0.642 10.503 0.666


BPFA+TV 0.096 0.906 0.814 7.23 0.836BPFA w/o Gi 0.103 0.897 0.807 7.758 0.836

Table 3: Quality metrics of different methods on QuickBirdimages from Figure 3.

RMSE CC WB ERGAS Q4

Brovey 0.133 0.924 0.695 6.388 0.761Wavelet 0.069 0.939 0.646 4.625 0.757

PCA 0.247 0.608 0.422 17.295 0.586FIHS 0.161 0.655 0.507 11.410 0.564


BPFA+TV 0.053 0.967 0.745 3.453 0.824BPFA w/o Gi 0.057 0.967 0.731 3.523 0.749

the ground truth LRMS images. For the QuickBird experi-ment we have the ground truth HRMS images. After someparameter tuning, we set v1 = 50, v2 = 70 and the ADMMparameter ρ = 3 for the IKONOS image and set v1 = 10,v2 = 30 and the ADMM parameter ρ = 5 for the Pleiadesimage for the BPFA+TV pan-sharpening model. For theQuickBird image we set v1 = 10, v2 = 30 and the ADMMparameter ρ = 20 for BPFA+TV pan-sharpening model.

4.1 Reconstruction results

The pan-sharpening results for the different algorithms areshown in Figure 1 for IKONOS, in Figure 2 for Pleiadesand Figure 3 for QuickBird. By visually comparing the fu-sion results, we can see that while several algorithms haveequally clear edges, they exhibit significant spectral distor-tion. For example, the red blocks in the IKONOS imageare much brighter than the original LRMS images for thePCA-based model and several algorithms have spectral dis-tortion in the vegetation areas for the Pleiades image. Themodel-based method is more blurry than BPFA+TV, andthe wavelet-based results exhibit a stair-casing effect. TheBPFA results do not appear to be as sharp as some otheralgorithms, such as adaptive IHS. This may be because thedictionary learning algorithm, which averages 16 values foreach pixel, is slightly “denoising” these edges.

In pan-sharpening, the spectral quality is considered as im-portant as the spatial quality, but is more difficult to judgevisually – especially considering that there are bands in thenon-visible spectrum. Therefore we quantitatively evalu-ate the reconstructions using the following standard pan-


(a) Low resolution MS image (b) High resolution PAN image (c) Brovey transform (d) Wavelet-based method

(e) Model-based IHS (f) PCA (g) Adaptive IHS (h) BPFA+TV

Figure 1: IKONOS images and pan-sharpening results for different methods.

(a) Low resolution MS image (b) High resolution PAN image (c) Brovey transform (d) Wavelet-based method

(e) FIHS (f) PCA (g) Adaptive IHS (h) BPFA+TV

Figure 2: Pleiades images and pan-sharpening results for different methods.


(a) Low resolution MS image (b) High resolution PAN image (c) Original HRMS image

(d) Wavelet-based method (e) FIHS (f) PCA

(g) Brovey transform (h) Adaptive IHS (i) BPFA+TV

Figure 3: QuickBird images and pan-sharpening results for different methods.

sharpening quality metrics [20]: Root mean square error(RMSE), Correlation coefficient (CC), Wang-Bovik mea-sure (WB), ERGAS and Q4. We show quantitative resultsin Tables 1-3 for each image, where we highlight the bestresult in bold. From these tables, we see that BPFA andBPFA+TV perform very competitively compared with theother methods in its ability to accurately learn the spectral

content of the HRMS images. We also show results withoutusing the first derivative penalty, that is, replacing Gi withthe identity matrix when penalizing the PAN reconstruc-tion. We can see an advantage to our method of only focus-ing on the high frequency edge information within the PANimage and allowing the spectral information in the smoothregions to only be determined by the LRMS images. We


(a) Dictionary

(b) Dictionary probabilities

(c) Patch sparsity histogram

Figure 4: Dictionary learning results for QuickBird.

also notice that in general penalizing the total variation hasa positive impact.

4.2 BNP dictionary learning results

We show example dictionary learning results for the Quick-Bird image in Figure 4. In Figure 4(a), we show the 4×4×4dictionary as 8 × 8 patches by putting the dictionary el-ements along the third dimension into quadrants within acell. In Figure 4(b) we show a sample of the dictionaryprobabilities, where we can see that most of the dictionaryelements have very small probability. Roughly speaking,about 50 of the 256 dictionary elements have significantprobability. Figure 4(c) shows a histogram of the patchsparsity for one sample. Here we see that the patches usea variable number of dictionary elements, which is deter-mined probabilistically through the sampler. Also, eachpatch has a low rank representation using the learned dic-tionary since each patch has 64 dimensions and on average15 basis functions are needed per patch.

5 Conclusion

We have proposed a new dictionary learning approach topan-sharpening that uses a Bayesian nonparametric modelto learn a sparse patch-level basis. We also considered atotal variation penalty for reconstruction, and a gradientpenalty on the PAN image to enforce high frequency fi-delity while minimizing spectral distortion. Experimentalresults on three images showed that the proposed model im-proves the spectral content of the reconstructions over othercommonly used methods. This suggests that our algorithmhas potential uses for other problems where spectral con-tent is important, such as hyperspectral imaging [21].

Appendix: An algorithm for BPFA

a) Update D: The least squares update for the dictionary is

D = XαT (ααT + (L/γε)IL)−1. (8)

The matrixX is constructed from the patches of the HRMSimages of the current iteration.

b) Update αi: From Algorithm 1, αi := ziksik. We up-date zik first stochastically by sampling from a Bernoullidistribution with probability

pik ∝ πk1√

1 + γεγsdTk dk

exp

{γε2

(dTk ri,−k)2

γs/γε + dTk dk

},

1− pik ∝ 1− πk, (9)

where ri,−k is the residual error in approximating the ithpatch ignoring the kth dictionary element. The correspond-ing least squares weight sik is

sik = zikdTk ri,−k/(γs/γε + dTk dk), (10)

and we then calculate the weight vector, αi = si ◦ zi.

c) Update γε: The least squares solution is

γε = (2g0 + LN1)/(2h0 +∑

i‖RiX0 −Dαi‖22). (11)

We observe that this is approximately the inverse of theempirical squared error.

d) Update πk: We update πk stochastically via Gibbs sam-pling from a beta distribution as follows,

πk ∼ Beta(a0 +∑i zik, b0 +

∑i(1− zik)). (12)

We observe that πk will be close to the empirical fractionof times that dictionary element dk is used by the patchesfor the previous iteration.


Acknowledgements

The project is supported by the National Natural ScienceFoundation of China (No. 30900328, 61172179, 61103121,81301278).

References

[1] H. Aanaes and J. Sveinsson, “Model-based satellite image fu-sion,” IEEE Trans. on Geoscience and Remote Sensing,vol.46, no. 5, pp. 1336-1346, 2008.

[2] M. Aharon, M. Elad, A. Bruckstein, and Y. Katz, “K-SVD:An Algorithm for designing of overcomplete dictionaries forsparse representation,” IEEE Trans. on Signal Processing,vol. 54, pp. 4311-4322, 2006.

[3] L. Alparone, L. Wald, J. Chanussot, C. Thomas, P. Gamba,and L. M. Bruce, “Comparison of pan-sharpening algorithms:Outcome of the 2006 GRS-S data fusion contest,” IEEETrans. on Geoscience and Remote Sensing,vol. 45, no. 10,pp. 3012-3021, 2007.

[4] S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, “Dis-tributed optimization and statistical learning via the alternat-ing direction method of multipliers,” Foundations and Trendsin Machine Learning, vol. 3, no. 1, pp. 1-122, 2010.

[5] J. Cheng, H. Zhang, H. Shen, and L. Zhang, “A practical com-pressed sensing-based pan-sharpening method,” IEEE Geo-science and Remote Sensing Letters, vol. 9, no. 4, pp. 629-633, 2012.

[6] X. Ding, L. He, and L. Carin, “Bayesian robust principalcomponent analysis,” IEEE Trans. on Image Processing, vol.20, no. 12, pp. 3419-3430, 2011.

[7] D. Fasbender, J. Radoux, and P. Bogaert, “Bayesian data fu-sion for adaptable image pan-sharpening,” IEEE Trans. onGeoscience and Remote Sensing, vol. 46, no. 6, pp. 1847-1857, 2008.

[8] T. Goldstein and S. Osher, “The Split Bregman Method for L1Regularized Problems,” SIAM Journal on Imaging Sciences,vol. 2, no. 2, pp. 323-343, 2009.

[9] N.L. Hjort, C. Holmes, P. Muller, S.G. Walker (editors),“Bayesian Nonparametrics,” Cambridge Series in Statisticaland Probabilistic Mathematics, 2010.

[10] N.L. Hjort, “Nonparametric Bayes estimators based on betaprocesses in models for life history data,” Annals of Statistics,vol. 18, pp. 1259-1294, 1990.

[11] M. V. Joshi, L. Bruzzone, and S. Chaudhuri, “A model-based approach to multiresolution fusion in remotely sensedimages,” IEEE Trans. on Geoscience and Remote Sensing,vol. 44, no. 9, pp. 2549-2562, 2006.

[12] S. Li and B. Yang, “A new pan-sharpening method using acompressed sensing technique,” IEEE Trans. on Geoscienceand Remote Sensing, vol. 49, no. 2,pp. 738-746, 2011.

[13] S. Osher L. Rudin and E. Fatemi, “Nonlinear total varia-tion based noise removal algorithms,” Physica D., vol. 60,pp. 259-268, 1992.

[14] X. Otazu and M. Gonzalez-Ausicana, “Introduction of sen-sor spectral response into image fusion methods: Applicationto wavelet-based methods,” IEEE Trans. on Geoscience andRemote Sensing, vol. 43, no. 10, pp. 2376-2385, 2005.

[15] J. Paisley and L. Carin, “Nonparametric factor analysis withbeta process priors,” in International Conference on MachineLearning, 2009.

[16] S. Rahmani, M. Strait, D. Merkurjev, M. Moeller, and T.Wittman, “An adaptive IHS pan-sharpening method,” IEEEGeoscience and Remote Sensing Letters, vol. 7, no. 4, pp.746-750, 2010.

[17] V. P. Shah and N. H. Younan, “An efficient pan-shapeningmethod via a combined adaptive PCA approach and con-tourlets,” IEEE Trans. on Geoscience and Remote Sensing,vol. 46, no. 5, pp. 1323-1335, 2008.

[18] R. Thibaux and M. Jordan, “Hierarchical beta processes andthe Indian buffet process,” in International Conference on Ar-tificial Intelligence and Statistics, 2007.

[19] T. M. Tu, P. S. Huang, C. L. Hung, and C. P. Chang, “Afast intensity-hue-saturation fusion technique with spectraladjustment for IKONOS imagery,” IEEE Geoscience and Re-mote Sensing Letters, vol. 1, no. 4, pp. 309-312, 2004.

[20] Z. Wang and A.C. Bovik, “A universal image quality index,”IEEE Trans. on Geoscience and Remote Sensing, vol. 9, no.3, pp. 81-84, 2002.

[21] M. Zhou, H. Chen, J. Paisley, L. Ren, L. Li, Z. Xing, D.Dunson, G. Sapiro and L. Carin, “Nonparametric Bayesiandictionary learning for analysis of noisy and incomplete im-ages,” IEEE Trans. on Image Processing,vol. 21, no. 1, pp.130-144, 2012.

pan-sharpening with a bayesian nonparametric dictionary learning...

Documents