laplacian regularized collaborative graph for discriminant

7066 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 12, DECEMBER 2016

Laplacian Regularized Collaborative Graph forDiscriminant Analysis of Hyperspectral Imagery

Wei Li, Member, IEEE, and Qian Du, Senior Member, IEEE

Abstract—Collaborative graph-based discriminant analysis(CGDA) has been recently proposed for dimensionality reduc-tion and classification of hyperspectral imagery, offering supe-rior performance. In CGDA, a graph is constructed by �2-normminimization-based representation using available labeled samples.Different from sparse graph-based discriminant analysis (SGDA)where a graph is built by �1-norm minimization, CGDA benefitsfrom within-class sample collaboration and computational effi-ciency. However, CGDA does not consider data manifold structurereflecting geometric information. To improve CGDA in this regard,a Laplacian regularized CGDA (LapCGDA) framework is pro-posed, where a Laplacian graph of data manifold is incorporatedinto the CGDA. By taking advantage of the graph regularizer, theproposed method not only can offer collaborative representationbut also can exploit the intrinsic geometric information. Moreover,both CGDA and LapCGDA are extended into kernel versions tofurther improve the performance. Experimental results on severaldifferent multiple-class hyperspectral classification tasks demon-strate the effectiveness of the proposed LapCGDA.

Index Terms—Collaborative graph, dimensionality reduction,graph embedding, hyperspectral data, Laplacian matrix.

I. INTRODUCTION

HYPERSPECTRAL imagery consists of hundreds ofcontiguous spectral wavelength bands that are highly

correlated. High dimensionality usually leads to the curse of di-mensionality problem, thus deteriorating classification perfor-mance, especially when the number of available labeled samplesis limited [1]–[5]. Dimensionality-reduction algorithms, remov-ing redundant features and preserving useful information ina low-dimensional subspace [6], [7], have been substantiallyinvestigated for hyperspectral image analysis.

Projection-based strategy is one of the major categories of di-mensionality reduction. The essence is to project original bandsinto a lower dimensional subspace based on a certain criterionfunction. For example, principal component analysis (PCA)[8] attempts finding a linear transformation by maximizing thevariance in the projected subspace; Fisher’s linear discriminant

Manuscript received April 12, 2016; revised June 23, 2016; acceptedJuly 25, 2016. Date of publication August 12, 2016; date of current versionSeptember 30, 2016. This work was supported in part by the National NaturalScience Foundation of China under Grant 61571033 and Grant 61302164 andin part by the Fundamental Research Funds for the Central Universities underGrant BUCTRC201401, Grant BUCTRC201615, and Grant XK1521.

W. Li is with the College of Information Science and Technology, BeijingUniversity of Chemical Technology, Beijing 100029, China (e-mail: [email protected]).

Q. Du is with the Department of Electrical and Computer Engineering,Mississippi State University, Starkville, MS 39762 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TGRS.2016.2594848

analysis (LDA) [9] maximizes the trace ratio between thebetween-class scatter and the within-class scatter. There are nu-merous modified versions, including the kernel versions, suchas kernel PCA [10], kernel LDA [11], local Fisher discriminantanalysis (LFDA) [12], genetic algorithm-based LFDA [13], andkernel LFDA [14]. Unlike PCA or LDA, locality preservingprojection (LPP) [15] seeks to find a linear map that preservesgeometric information of neighboring samples in the originalspace. In [12], this type of manifold learning technique hasbeen verified to be excellent at capturing manifold structure inhyperspectral imagery.

Graph, as a mathematical form of data representation, hasbeen successfully used for remote sensing image analysis,such as classification, segmentation, detection, and data fusion[16]–[20]. Recently, due to the effectiveness of graph embed-ding, graph-based dimensionality reduction has received greatattention [21]–[26]. A general framework for dimensionalityreduction, denoted as sparsity-preserving graph embedding,was proposed in [27]. Compared with the traditional k-nearestneighbor (k-NN)-based graph [28], the sparsity-based graphcan provide greater robustness to additive data noise [29]. In[30], sparse graph-based discriminant analysis (SGDA) wasdeveloped [30] for dimensionality reduction and classificationin hyperspectral imagery. SGDA preserves sparse connectionamong class-specific labeled samples. Weighted SGDA wasto integrate both locality and sparsity structures [31]. In [32],block-based SGDA was employed for semisupervised classi-fication. In [33], simultaneous sparse graph embedding wasproposed. In [34], a sparse and low-rank graph-based discrimi-nant analysis was presented by combining both sparsity and lowrankness to maintain global and local structures simultaneously.

Different from the aforementioned sparse graph, collabo-rative graph-based discriminant analysis (CGDA) [35] waspresented by replacing the �1-norm minimization in solving theweight matrix with an �2-norm minimization. The motivation isbased on the fact that it is the “collaborative” instead of “com-petitive” nature imposed by the sparsity constraint that actu-ally provides comparative classification performance [36]–[39].Furthermore, CGDA is computationally very efficient becausea closed-form solution is available when estimating the repre-sentation coefficients. In [35], CGDA has been demonstrated tooffer even more superior classification performance and lowercomputational cost, and the former thus can be viewed as abetter choice.

Nevertheless, CGDA does not consider data manifoldstructure. There are research works in the literature related toembedding local manifold structures, such as LPP [15], locallylinear embedding [40], and neighborhood preserving embed-ding [41]. In these methods, for two data points that lie closelyin the original space, their intrinsic geometry distribution

0196-2892 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

mailto: [email protected]




http://ieeexplore.ieee.org

LI AND DU: LAPCGDA OF HYPERSPECTRAL IMAGERY 7067

should be preserved in the new subspace. Based on this concept,Laplacian regularized Gaussian mixture model (LapGMM) waspresented for data clustering [42], and Laplacian regularizedlow-rank representation (LapLRR) was developed for imageclustering and classification [43]. In [44], graph constructionusing local manifold learning was proposed for semisupervisedhyperspectral image classification. In [45], sparse discriminantembedding with manifold learning was presented for dimen-sionality reduction in hyperspectral imagery.

Motivated by these works, a Laplacian regularized CGDA(LapCGDA) framework is proposed, where a Laplacian graphof data manifold is incorporated into the CGDA during graphconstruction. Such a Laplacian graph captures intrinsic geomet-rical structure such that pixel relationship in the original datageometry is preserved. By taking advantage of this graph reg-ularizer, the proposed method not only can offer collaborativerepresentation but also can exploit the intrinsic geometric in-formation, offering more discriminative power than the originalCGDA. In summary, the main contributions of this paper canbe summarized as follows. 1) To the best of our knowledge,this is the first time that collaborative and Laplacian graphis adopted for dimensionality reduction and classification inhyperspectral imagery, and the process of graph constructioncan be as fast as CGDA since a closed-form solution can bederived. 2) The resulting graph combines constraints with bothcollaboration in representation and data manifold structure pre-served, which successfully makes the induced projection morestable and discriminative. 3) Both CGDA and LapCGDA arefurther extended into kernel versions, which are able to extractnonlinear discriminant features in kernel-induced spaces.

The remainder of this paper is organized as follows. Section IIreviews the graph-embedding dimensionality-reduction frame-work, including SGDA and CGDA. Section III primarilydescribes the proposed LapCGDA algorithm and its kernelversion. Section IV validates the proposed approaches andreports classification results, as compared with several state-of-the-art alternatives. Section V makes some concluding remarks.

II. RELATED WORK

A. Graph-Embedding Dimensionality Reduction

Consider a hyperspectral data set with M labeled samplesdenoted as X = {xi}Mi=1 in an R

d×1 feature space, whered is the number of bands. An intrinsic graph is denoted asG = {X,W} with W being an affinity matrix, and a penaltygraph is represented as Gp = {X,Wp} with Wp being apenalty weight matrix. Let C be the number of classes, ml bethe number of available labeled samples in the lth class, and∑C

l=1 ml = M .The graph-embedding dimensionality-reduction framework

[21], [27] seeks to find ad×K projection matrixP (withK�d),which results in a low-dimensional subspace Y = PTX. Theobjective is to maintain class separability by preserving therelationship of data points in the original space. The objectivefunction can be mathematically formed as

P̃ = arg minPTXLpXTP

∑i�=j

‖PTxi −PTxj‖2Wi,j

= arg minPTXLpXTP

tr(PTXLXTP) (1)

where L is the Laplacian matrix of graph G, L = D−W, Dis a diagonal matrix with the ith diagonal element being Dii =∑M

j=1 Wi,j , and Lp may be the Laplacian matrix of the penaltygraph Gp or a simple scale normalization constraint [21]. Theoptimal projection matrix P can be obtained as

P̃ = argminP

|PTXLXTP||PTXLpXTP| (2)

which can be solved as a generalized eigenvalue decompositionproblem

XLXTP = ΛXLpXTP (3)

where Λ is a diagonal eigenvalue matrix. For a d×K pro-jection matrix P, it is constructed by the K eigenvectorscorresponding to the K smallest nonzero eigenvalues. Note thatthe performance of graph-embedding-based dimensionality-reduction algorithms mainly depends on the choice of G.

B. CGDA

In CGDA [35], for each pixel xi in the dictionary X, thecollaborative representation vector is calculated by solving the�2-norm optimization problem

argminwi

‖wi‖2 s.t. Xiwi = xi (4)

where Xi does not include xi itself, and wi is a vector of size(M − 1)× 1. If ‖ · ‖2 is replaced with ‖ · ‖1, (4) becomes theobjective function of SGDA [30], [32]. Note that (4) can befurther rewritten as

argminwi

‖xi −Xiwi‖22 + λ‖wi‖22 (5)

where λ is a Lagrange multiplier. Equation (5) is equivalent to

argminwi

[wi

T (XiTXi + λI)wi − 2wi

TXiTxi

]. (6)

Taking the derivative with regard to wi and setting the resultantequation to zero yields, the closed-form solution can be written as

wi = (XiTXi + λI)

−1Xi

Txi. (7)

We define W = [w1,w2, . . . ,wM ] as the graph weight matrixof size M ×M whose column wi is the collaborative repre-sentation vector corresponding to xi. Note that the diagonalelements in W are set to be zero. In [35], the affinity matrixW is actually calculated using the within-class samples. Thus,W can be expressed in the form of a block-diagonal structure

W =

⎡⎢⎣W(1) 0

. . .0 W(C)

⎤⎥⎦ (8)

where {W(l)}Cl=1 is the weight matrix of size ml ×ml usinglabeled samples in the lth class only. It has been demonstratedthat the strategy with class label information has better discrim-inant ability [30].


III. PROPOSED DIMENSIONALITY-REDUCTION METHODS

A. LapCGDA

In [35], it has been demonstrated that CGDA is superior toSGDA from the perspectives of both classification performanceand computational cost. However, both SGDA and CGDA donot consider manifold structure within data, which may resultin missing some locality information in the embedding process.

To address this issue, LapCGDA is proposed, i.e., a collab-orative and Laplacian graph is constructed with the objectivefunction described as

argminwi

‖xi −Xiwi‖22 + λ‖wi‖22 + βTi (9)

where λ and β are two regularization parameters to balancethe two types of penalty, Ti is the manifold regularizationterm corresponding to wi, and Ti = wT

i Ziwi. Note that, whenβ = 0, LapCGDA actually reduces to CGDA. Here, Zi is theLaplacian of the graph with affinity matrix Ai whose pqthelement is calculated as Ap,q = exp(−(‖xp − xq‖22/γpγq)),where γp = ‖xp − x

(knn)p ‖ denotes the local scaling of data

samples in the neighborhood of xp, x(knn)p is the knn-nearest

neighbor1 of xp, and xp and xq are from Xi without xi itself.Note that this affinity matrix has been proved to be effective inlocality preservation [12], [46].

Taking the derivative with regard to weight vector wi andsetting the resultant equation to zero yields

−XiTxi +Xi

TXiwi + βXTi ZiXiwi + λwi = 0 (10)

and the closed-form solution is

wi = (XiTXi + λI+ βZi)

−1Xi

Txi. (11)

Thus, a manifold regularized collaborative graph is obtained.Note that the graph is constructed with class label informationthe same as CGDA. The overall description of the proposedLapCGDA is given as Algorithm 1.

Algorithm 1 Proposed LapCGDA Algorithm

Input: Training data X = {xi}Mi=1 ∈ Rd with class label and

the regularization parameters λ and βNormalize the columns of X to have unit �2-norm;Obtain graph matrix W via solving (11) in a closed form;Compute projections by solving the eigenvalue decompositionin (3);Output: A projection matrix P.

B. Kernel Extensions

Kernel methods learn nonlinear decision boundaries in akernel-induced space [11], [14], [47], whose dimensionality ismuch higher than the input space. Kernel trick has been widelyused without explicitly evaluating a nonlinear mapping func-

1Here, knn is a tuning parameter. According to our empirical study, knn = 7works well for all the experiments.

tion. For a given mapping function Φ, Mercer kernel functionk(·, ·) can be represented as

k(xi,xj) = Φ(xi)TΦ(xj) (12)

where Φ maps the pixel x to the kernel-induced feature space:x → Φ(x) ∈ R

D×1 (D � d is the dimension of kernel featurespace). Commonly used kernels include the t-degree polyno-mial kernel k(xi,xj)=(xi

Txj+1)t(t∈Z

+), and the Gaussianradial basis function (RBF) kernel k(xi,xj) = exp(−σ‖xi −xj‖22) (σ > 0 is the parameter of RBF kernel).

For the graph-embedding process, projectionP(k) in the ker-nel space is given by the solution of the generalized eigenvalueproblem

KL(k)KTP(k) = Λ(k)KL(k)KTP(k) (13)

where K = ΦTΦ ∈ RM×M represents the Gram matrix with

Ki,j = k(xi,xj), and L(k) is the Laplacian matrix calculatedaccording to the weight matrix W(k) in the kernel space.

In the kernel CGDA (KCGDA), the objective functionbecomes

argminw∗

i

‖Φ(xi)−Φiw∗i‖

22 + λ ‖w∗

i‖22 (14)

where Φi=[Φ(x1),Φ(x2), . . . ,Φ(xM )]∈RD×(M−1), exclud-

ing Φ(xi). The weight vector w∗i with size of (M − 1)× 1 can

be recovered in a closed-form solution

w∗i =

ΦTi Φ(xi)

ΦTi Φi + λI

=(Ki + λI)−1k(·,xi) (15)

wherek(·,xi)=[k(x1,xi), k(x2,xi), . . . , k(xM,xi)]T ∈R

(M−1)×1,and Ki = ΦT

i Φi ∈ R(M−1)×(M−1). Then, the weight matrix

W(k) in the kernel space can be constructed just similar to (8).In the kernel LapCGDA (KLapCGDA), the affinity matrixA(k)

ican be expressed as

A(k)i = exp

(−‖Φ(xp)− Φ(xq)‖22

γpγq

)= exp

(− (Φ(xp)− Φ(xq))

T (Φ(xp)− Φ(xq))

γpγq

)= exp

(−Kp,p +Kq,q − 2Kp,q

γpγq

). (16)

After obtaining A(k)i , the graph Laplacian matrix Z

(k)i =

B(k)i −A

(k)i , where B

(k)i is a diagonal matrix with the pth

diagonal element being Bpp =∑M−1

q=1 A(k)p,q . Subsequently, the

weight vector is computed in a closed form

w∗i =

ΦTi Φ(xi)

ΦTi Φi + λI+ βZ

(k)i

=(Ki + λI+ βZ

(k)i

)−1

k(·,xi). (17)

Finally, the weight matrix W(k) in the kernel space can befurther calculated. In this paper, RBF kernel is employed.


Fig. 1. Visualization of graph weights for CGDA and LapCGDA using three-class synthetic data. (a) CGDA graph. (b) LapCGDA graph.

C. Analysis on LapCGDA and KLapCGDA

For hyperspectral data, spectral signatures can be affected bymany factors such as illumination conditions, geometric fea-tures of material surfaces, and atmospheric affects [48]. In thispaper, LapCGDA is proposed as a dimensionality-reductionstep to preserve intrinsic geometry of the data. By consideringboth collaboration in representation and data manifold, it is ex-pected that the induced subspace by LapCGDA provides morediscriminating information; when combined with a classifiersuch as support vector machine (SVM) [49], [50], the resultingclassification is more accurate.

In order to illustrate the benefit of LapCGDA, we test withthree-class synthetic data, where the statistical distribution iscomplex. That is, class 2 (marked by blue square) is relativelyseparable from the other two; class 1 (marked by red plus)mainly has two parts, one of which is significantly overlappedwith class 3 (marked by black circle). Fig. 1 illustrates the graphmatrix learned by CGDA and the proposed LapCGDA. Bothgraphs reveal three independent segments. It is apparent that,for each segment in CGDA, the distribution of white points(nonzero coefficients) is obviously chaotic; nevertheless, thegraph obtained by LapCGDA presents within-block patternsobviously. This type of block pattern may potentially capturesome intrinsic correlation among samples, e.g., the geometricstructure within data, which is ignored by CGDA.

Fig. 2 further shows classification maps produced by thesetwo techniques. For better visual comparison, we plot circleswith black dash. To demonstrate the benefit of the proposedLapCGDA, we highlight the area with circle in Fig. 2(c) and(d), where the misclassified samples by CGDA are obvious,e.g., the samples of class 1 are wrongly labeled as class 2.Generally, the classification accuracy of LapCGDA is as high as92.67%, with an improvement of approximately 5% comparedwith CGDA. Fig. 2(d) and (e) also illustrates the performanceof KCGDA and KLapCGDA, which are obviously better thantheir counterparts.

IV. EXPERIMENTAL RESULTS

A. Hyperspectral Data

The first data set2 employed in the experiment was ac-quired using National Aeronautics and Space Administration’s

2http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes

Fig. 2. Two-dimensional three-class synthetic data classification performance(the circle with black dash emphasizes the improved area). Note that the x- andy-axes indicate the range of data after being projected into the 2-D subspace.(a) Three-class synthetic data. (b) CGDA: 87.03%. (c) LapCGDA: 92.67%.(d) KCGDA: 89.33%. (e) KLapCGDA: 96.00%.

TABLE ICLASS LABELS AND TRAIN–TEST DISTRIBUTION

OF SAMPLES FOR THE INDIAN PINES DATA SET

Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sen-sor over northwest Indiana’s Indian Pine test site in June 1992.The image represents an image scene with 145 pixels ×145 pixels with 20-m spatial resolution and 220 bands in0.4- to 2.45-μm spectrum region. It contains two-thirds agri-culture and one-third forest. In this paper, a total of 200 bandsare used after removal of water-absorption bands. There are16 land-cover classes but not all mutually exclusive in thedesignated ground truth map. The numbers of training andtesting samples are summarized in Table I.

http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes

http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes


TABLE IICLASS LABELS AND TRAIN–TEST DISTRIBUTION

OF SAMPLES FOR THE SALINAS DATA SET

TABLE IIICLASS LABELS AND TRAIN–TEST DISTRIBUTION OFSAMPLES FOR THE UNIVERSITY OF PAVIA DATA SET

The second data set was also collected by the AVIRIS sensor,capturing an area over Salinas Valley, California. The imagecomprises 512 pixels × 217 pixels with a spatial resolutionof 3.7 m and 204 bands after 20 water-absorption bands areremoved. It mainly contains vegetables, bare soils, and vineyardfields. There are also 16 classes, and the numbers of training andtesting samples are listed in Table II.

The third experimental data set was collected by the Reflec-tive Optics System Imaging Spectrometer (ROSIS) sensor overthe city of Pavia, Italy. The image scene covers spatial cover-age of 610 pixels × 340 pixels, collected under the HySensProject managed by the German Aerospace Agency (DLR).The data set has 103 spectral bands prior to water-band removalwith spectral coverage from 0.43 to 0.86 μm and a spatialresolution of 1.3 m. Approximately 42 776 labeled pixels withnine classes are from the ground truth map. More detailedinformation of the numbers of training and testing samples aresummarized in Table III.

B. Parameter Tuning

The classical SVM classifier is employed to validate theaforementioned dimensionality-reduction methods, includingCGDA, LapCGDA, KCGDA, and KLapCGDA. A fivefoldcross-validation strategy is employed for tuning parameters inclassification tasks.

Fig. 3 illustrates the sensitivity of the proposed LapCGDAas a function of two important regularization parameters (i.e., λand β) in its objective function [e.g., (9)]. In the experiment,λ ischosen from the region of {1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1}and β is chosen from the region of {0, 1e-6, 1e-5, 1e-4, 1e-3,

Fig. 3. Parameter tuning of β and λ for the proposed LapCGDA using threeexperimental data sets. (a) Indian Pines data. (b) Salinas data. (c) University ofPavia data.

1e-2, 1e-1, 1, 1e1}. Optimal λ and β are determined for bothLapCGDA and CGDA from the results in Fig. 3. For example,the optimal λ of LapCGDA is 1e-2 and the one of β is 1e-4 forthe Indian Pines data and the Salinas data; as for the University


Fig. 4. Classification accuracy versus reduced dimensionality K for methodsusing the experimental data sets. (a) Indian Pines data. (b) Salinas data.(c) University of Pavia data.

of Pavia data, the value of λ and β can be set to 1e-3. It is worthmentioning that a nonzero value of β verifies that the manifoldregularization term can have an impact on the dimensionality-reduction process. For KCGDA and KLapCGDA, the optimalλ and β are obtained in a similar way; as for the RBF kernelparameter, σ is set by the median value of 1/(‖xi − x̄‖22),i = 1, 2, . . . ,M , where x̄ = (1/M)

∑Mi=1 xi is the mean of all

available training samples [51].Fig. 4 illustrates the classification accuracy as a function of

the reduced dimensionality K for SGDA, CGDA, LapCGDA,KCGDA, and KLapCGDA. It is apparent that the performancetends to be stable when the dimensionality is larger than acertain value. For example, a reduced dimension of 20 appearsto be sufficient in these three experimental data sets. Fromthe curves in Fig. 4, we notice that, for low dimensionality,

TABLE IVSVM CLASS-SPECIFIC ACCURACY (IN PERCENTAGE) AND OA

OF DIFFERENT TECHNIQUES FOR THE INDIAN PINES DATA

TABLE VSVM CLASS-SPECIFIC ACCURACY (IN PERCENTAGE) AND

OA OF DIFFERENT TECHNIQUES FOR THE SALINAS DATA

TABLE VISVM CLASS-SPECIFIC ACCURACY (IN PERCENTAGE) AND OA OF

DIFFERENT TECHNIQUES FOR THE UNIVERSITY OF PAVIA DATA

TABLE VIISTATISTICAL SIGNIFICANCE FROM THE STANDARDIZED MCNEMAR’S

TEST ABOUT THE DIFFERENCE BETWEEN METHODS


Fig. 5. Thematic maps resulting from classification for the Indian Pines data set with 16 classes. (a) Pseudo-color image. (b) Ground truth map. (c) LFDA:81.79%. (d) SGDA: 83.34%. (e) CGDA: 84.59%. (f) LapCGDA: 86.70%. (g) KCGDA: 86.69%. (h) KLapCGDA: 88.52%.

classification accuracy is often not high, whereas that ofLapCGDA is always better than that of SGDA and CGDA,which further confirms that the proposed strategy is able to finda transform that can effectively reduce the dimensionality whileenhancing class separability.

C. Classification Performance

We compare the proposed LapCGDA with all the bands(denoted as “ALL,” without dimensionality reduction), thetraditional LDA and LFDA, and the state-of-the-art SGDAand CGDA; furthermore, the performances of KCGDA andKLapCGDA are also included. Tables IV–VI list the class-specific accuracy and overall accuracy (OA) for these threeexperimental data sets. From the results of each individualmethod, sometimes LDA is even worse than ALL since itsreduced dimension is limited to C − 1, which may lose use-ful information. Furthermore, CGDA is generally superior toSGDA, LapCGDA performs better than SGDA and CGDA,and KCGDA outperforms CGDA and so does KLapCGDA.For example, in Table IV, LapCGDA (i.e., 86.70%) yieldsmore than 2% higher accuracy than CGDA (i.e., 84.59%), andKLapCGDA (88.52%) also provides approximately 2% higheraccuracy than KCGDA (i.e., 86.69%). It is interesting to noticethat, for class 9 (i.e., Oats), the number of training samples isextremely small, causing many methods lose efficacy; however,the proposed KLapCGDA achieves 95% accuracy, which veri-fies its effectiveness.

In order to demonstrate the statistical significance in accu-racy improvement of the proposed methods, the standardized

McNemar’s test [52] is employed, as listed in Table VII. TheZ values of McNemar’s test larger than 1.96 and 2.58 meanthat two results are statistically different at the 95% and 99%confidence levels, respectively. The sign of Z indicates whetherclassifier 1 outperforms classifier 2 (Z > 0) or vice versa. Inthe experiment, we run the comparison between LapCGDAand CGDA, KCGDA and CGDA, KLapCGDA and LapCGDA,and KLapCGDA and KCGDA separately. In Table VII, allthe values are larger than 2.58, which confirms that the pro-posed LapCGDA and KLapCGDA are highly discriminativedimensionality-reduction methods.

Figs. 5–7 further illustrate the thematic maps. We producedground-cover maps of the entire image scene for these images(including unlabeled pixels). However, to facilitate easy com-parison between methods, only areas for which we have groundtruth are shown in these maps. These maps are consistent withthe results listed in Tables IV–VI, respectively. Some areas inthe classification maps produced by LapCGDA are obviouslyless noisy than those produced by SGDA and CGDA, e.g.,the regions of Soybeans-no till and Soybeans-clean in Fig. 5,the one of Vinyard-untrained in Fig. 6, and the one of Gravelin Fig. 7.

Fig. 8 illustrates the classification performance with differentnumbers of training samples. Usually, the number of trainingsamples available may be insufficient to estimate models foreach class in practical situations, which is necessary to inves-tigate the sensitivity of training sizes. As shown in Fig. 8, forthe Indian Pines data, the training size is changed from 1/10 to1/5 (note that 1/10 is the ratio of number of training samplesto the total labeled data); for the Salinas data and University


Fig. 6. Thematic maps resulting from classification for the Salinas data set with 16 classes. (a) Pseudo-color image. (b) Ground truth map. (c) LFDA: 91.22%.(d) SGDA: 91.82%. (e) CGDA: 93.00%. (f) LapCGDA: 94.13%. (g) KCGDA: 93.97%. (h) KLapCGDA: 94.56%.

of Pavia data, the training-sample-size ratio is changed fromthe regions of [0.01, 0.05] and [0.06, 0.1] with an interval of0.01, respectively. From the results, LapCGDA still consistentlyperforms better than SGDA and CGDA, and kernel methods

outperform the linear versions. For example, KLapCGDA isalways with 2% improvement compared with LapCGDA for theIndian Pines data; in Fig. 8(b), the improvement is even moreobvious when the training size is extremely low (e.g., 0.01).


Fig. 7. Thematic maps resulting from classification for the University of Pavia data set with 9 classes. (a) Pseudo-color image. (b) Ground truth map. (c) LFDA:92.77%. (d) SGDA: 90.58%. (e) CGDA: 92.37%. (f) LapCGDA: 94.46%. (g) KCGDA: 93.28%. (h) KLapCGDA: 95.58%.

Table VIII summarizes the computational complexity of theaforementioned graph-based dimensionality-reduction meth-ods. All experiments were carried out using MATLAB on anIntel(R) Core(TM) i7-3770 CPU machine with 8 GB of RAM.Obviously, CGDA is much faster than SGDA, which verifies itstime efficiency. Based on this benefit, LapCGDA also providesdesired computational performance, just slightly worse thanCGDA due to the computation burden of an additional affinitymatrix. Even for KCGDA and KLapCGDA, the computationalcost is much lower than SGDA.

V. CONCLUSION

In this paper, a LapCGDA framework has been proposedto improve the state-of-the-art CGDA. In LapCGDA, the

Laplacian of the data manifold graph was incorporated intoCGDA, exploiting the intrinsic geometric information withindata. By considering both collaboration in representation andmanifold structure, the subspace induced by LapCGDA pro-vided more discriminative information. Furthermore, due to thefact that the solution of graph construction can be expressedin a closed form, the computational cost of the proposedLapCGDA is extremely low. Both CGDA and LapCGDA wereextended into kernel versions, e.g., KCGDA and KLapCGDA.Experimental results with synthetic data and real hyperspectralimages have demonstrated that the proposed LapCGDA andKLapCGDA are effective in dimensionality-reduction tasks andcan provide superior performance when compared with SGDAand CGDA from the perspectives of both classification accuracyand computational efficiency.


Fig. 8. Classification performance of methods with different numbers oftraining sample sizes using the experimental data sets. (a) Indian Pines data.(b) Salinas data. (c) University of Pavia data.

TABLE VIIIEXECUTION TIME (IN SECONDS) IN THE

THREE EXPERIMENTAL DATA SETS

ACKNOWLEDGMENT

The authors would like to thank Dr. Nam Ly for sharingthe MATLAB code of sparse graph-based discriminant analysisand collaborative graph-based discriminant analysis for com-parison purposes.

REFERENCES

[1] B. Du, L. Zhang, L. Zhang, T. Chen, and K. Wu, “A discriminativemanifold learning based dimension reduction method for hyperspectralclassification,” Int. J. Fuzzy Syst., vol. 14, no. 2, pp. 272–277, Jun. 2012.

[2] S. Prasad, W. Li, J. E. Fowler, and L. M. Bruce, “Information fusionin the redundant-wavelet-transform domain for noise-robust hyperspec-tral classification,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 9,pp. 3474–3486, Sep. 2012.

[3] W. Li, E. W. Tramel, S. Prasad, and J. E. Fowler, “Nearest regularizedsubspace for hyperspectral classification,” IEEE Trans. Geosci. RemoteSens., vol. 52, no. 1, pp. 477–489, Jan. 2014.

[4] B. Du and L. Zhang, “A discriminative metric learning based anomalydetection method,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 11,pp. 6844–6857, Nov. 2014.

[5] Y. Gu, T. Liu, X. Jia, J. A. Benediktsson, and J. Chanussot, “Nonlinearmultiple kernel learning with multiple-structure-element extended mor-phological profiles for hyperspectral image classification,” IEEE Trans.Geosci. Remote Sens., vol. 54, no. 6, pp. 3235–3247, Jun. 2016.

[6] B. Du and L. Zhang, “Target detection based on a dynamic subspace,”Pattern Recognit., vol. 47, no. 1, pp. 344–358, Jan. 2014.

[7] L. Gao et al., “Subspace-based support vector machines for hyperspectralimage classification,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 2,pp. 349–353, Feb. 2015.

[8] M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Kernel principal com-ponent analysis for the classification of hyperspectral remote sensing dataover urban areas,” EURASIP J. Appl. Signal Process., vol. 2009, no. 1,pp. 1–14, Jan. 2009.

[9] W. Li, S. Prasad, and J. E. Fowler, “Noise-adjusted subspace discriminantanalysis for hyperspectral imagery classification,” IEEE Geosci. RemoteSens. Lett., vol. 10, no. 6, pp. 1374–1378, Nov. 2013.

[10] J. Yang, A. F. Frangi, J. Yang, D. Zhang, and Z. Jin, “KPCA plus LDA:A complete kernel Fisher discriminant framework for feature extractionand recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 2,pp. 230–244, Feb. 2005.

[11] W. Li, S. Prasad, and J. E. Fowler, “Decision fusion in kernel-inducedspaces for hyperspectral image classification,” IEEE Trans. Geosci.Remote Sens., vol. 52, no. 6, pp. 3399–3411, Jun. 2014.

[12] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preservingdimensionality reduction and classification for hyperspectral image analy-sis,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4, pp. 1185–1198,Apr. 2012.

[13] M. Cui, S. Prasad, W. Li, and L. M. Bruce, “Locality preserving ge-netic algorithms for spatial–spectral hyperspectral image classification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 3,pp. 1688–1697, Jun. 2013.

[14] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preservingdiscriminant analysis in kernel-induced feature spaces for hyperspectralimage classification,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 5,pp. 894–898, Sep. 2011.

[15] X. He and P. Niyogi, “Locality preserving projections,” in Advancesin Neural Information Processing System, S. Thrun, L. Saul, andB. Schölkopf, Eds. Cambridge, MA, USA: MIT Press, 2004.

[16] V. Harikumar, P. P. Gajjar, M. V. Joshi, and M. S. Raval, “Multiresolutionimage fusion: Use of compressive sensing and graph cuts,” IEEE J. Sel.Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 5, pp. 1771–1780,May 2014.

[17] Y. Li, Y. Tan, J. Den, Q. Wen, and J. Tian, “Cauchy graph embedding opti-mization for built-up areas detection from high-resolution remote sensingimages,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8,no. 5, pp. 2078–2096, May 2015.

[18] W. Liao, M. Dalla Mura, J. Chanusso, and A. Pizurica, “Fusion of spectraland spatial information for classification of hyperspectral remote-sensedimagery by local graph,” IEEE J. Sel. Topics Appl. Earth Observ. RemoteSens., vol. 9, no. 2, pp. 583–594, Feb. 2016.

[19] S. Jia, X. Zhang, and Q. Li, “Spectral–spatial hyperspectral image clas-sification using �1/2 regularized low-rank representation and sparserepresentation-based graph cuts,” IEEE J. Sel. Topics Appl. Earth Observ.Remote Sens., vol. 8, no. 6, pp. 2473–2484, Jun. 2015.


[20] M. T. Pham, G. Mercier, and J. Michel, “Pointwise graph-based localtexture characterization for very high resolution multispectral image clas-sification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8,no. 5, pp. 1962–1973, May 2015.

[21] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embed-ding and extensions: A general framework for dimensionality reduction,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 40–51, Jan. 2007.

[22] D. Cai, X. He, J. Han, and T. Huang, “Graph regularized nonnegativematrix factorization for data representation,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 33, no. 8, pp. 1548–1560, Aug. 2011.

[23] L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, and N. Yu, “Non-negativelow rank and sparse graph for semi-supervised learning,” in Proc. IEEEComput. Soc. Conf. Comput. Vis. Pattern Recognit., Providence, RI, USA,2012, pp. 2328–2335.

[24] M. Zhao, L. Jiao, J. Feng, and T. Liu, “A simplified low rank andsparse graph for semi-supervised learning,” Neurocomputing, vol. 140,pp. 84–96, 2014.

[25] H. Yuan and Y. Tang, “Learning with hypergraph for hyperspectral imagefeature extraction,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 8,pp. 1695–1699, Aug. 2015.

[26] W. Li, J. Liu, and Q. Du, “Sparse and low rank graph-based discrimi-nant analysis for hyperspectral image classification,” IEEE Trans. Geosci.Remote Sens., vol. 54, no. 7, pp. 4094–4105, Jul. 2016.

[27] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with�1-graph for image analysis,” IEEE Trans. Image Process., vol. 19, no. 4,pp. 858–866, Apr. 2010.

[28] J. Tang, R. Hong, S. Yan, T. Chua, G. Qi, and R. Jain, “Image annotationby k-NN sparse graph-based label propagation over noisily tagged webimages,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 2, pp. 1–14, 2011.

[29] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, “Sparserepresentation for computer vision and pattern recognition,” Proc. IEEE,vol. 98, no. 6, pp. 1031–1044, Jun. 2010.

[30] N. Ly, Q. Du, and J. E. Fowler, “Sparse graph-based discriminant analysisfor hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 52,no. 7, pp. 3872–3884, Jul. 2014.

[31] W. He, H. Zhang, L. Zhang, W. Philips, and W. Liao, “Weighted sparsegraph based dimensionality reduction for hyperspectral images,” IEEEGeosci. Remote Sens. Lett., vol. 13, no. 5, pp. 686–690, May 2016.

[32] K. Tan, S. Zhou, and Q. Du, “Semi-supervised discriminant analysis forhyperspectral imagery with block-sparse graph,” IEEE Geosci. RemoteSens. Lett., vol. 12, no. 8, pp. 1765–1769, Aug. 2015.

[33] Z. Xue, P. Du, J. Li, and H. Su, “Simultaneous sparse graph embedding forhyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,vol. 53, no. 11, pp. 6114–6133, Nov. 2015.

[34] W. Li, J. Liu, and Q. Du, “Sparse and low-rank graph for discriminantanalysis of hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens.,vol. 54, no. 7, pp. 4094–4105, Jul. 2016.

[35] N. Ly, Q. Du, and J. E. Fowler, “Collaborative graph-based discriminantanalysis for hyperspectral imagery,” IEEE J. Sel. Topics Appl. EarthObserv. Remote Sens., vol. 7, no. 6, pp. 2688–2696, Jun. 2014.

[36] W. Li and Q. Du, “Joint within-class collaborative representation for hy-perspectral image classification,” IEEE J. Sel. Topics Appl. Earth Observ.Remote Sens., vol. 7, no. 6, pp. 2200–2208, Jun. 2014.

[37] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collabora-tive representation: Which helps face recognition?” in Proc. Int. Conf.Comput. Vis., Barcelona, Spain, Nov. 2011, pp. 471–478.

[38] W. Li and Q. Du, “Collaborative representation for hyperspectral anom-aly detection,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 3,pp. 1463–1474, Mar. 2015.

[39] W. Li, Q. Du, and B. Zhang, “Combined sparse and collaborative repre-sentation for hyperspectral target detection,” Pattern Recognit., vol. 48,pp. 3904–3916, 2015.

[40] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,Dec. 2000.

[41] X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preserving em-bedding,” in Proc. Int. Conf. Comput. Vis., Beijing, China, Oct. 2005,pp. 1208–1213.

[42] X. He, D. Cai, Y. Shao, H. Bao, and J. Han, “Laplacian regularizedGaussian mixture model for data clustering,” IEEE Trans. Knowl. DataEng., vol. 23, no. 9, pp. 1406–1418, Sep. 2011.

[43] J. Liu, Y. Chen, J. Zhang, and Z. Xu, “Enhancing low-rank subspaceclustering by manifold regularization,” IEEE Trans. Image Process.,vol. 23, no. 9, pp. 4022–4030, Sep. 2014.

[44] L. Ma, M. M. Crawford, X. Yang, and Y. Guo, “Local manifold learningbased graph construction for semisupervised hyperspectral image classifi-cation,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2832–2844,May 2015.

[45] H. Huang and M. Yang, “Dimensionality reduction of hyperspectral im-ages with sparse discriminant embedding,” IEEE Trans. Geosci. RemoteSens., vol. 53, no. 9, pp. 5160–5169, Sep. 2015.

[46] M. Sugiyama, “Local fisher discriminant analysis for supervised dimen-sionality reduction,” in Proc. Int. Conf. Mach. Learn., Pittsburgh, PA,USA, Jun. 2006, pp. 905–912.

[47] D. Wang, H. Lu, and M. H. Yang, “Kernel collaborative face recognition,”Pattern Recognit., vol. 48, no. 10, pp. 3025–3037, Oct. 2015.

[48] G. Shaw and D. Manolakis, “Signal processing for hyperspectral imageexploitation,” IEEE Signal Process. Mag., vol. 19, no. 1, pp. 12–16,Jan. 2002.

[49] C.-H. Li, B.-C. Kuo, C.-T. Lin, and C.-S. Huang, “A spatial–contextualsupport vector machine for remotely sensed image classification,” IEEETrans. Geosci. Remote Sens., vol. 50, no. 3, pp. 784–799, Mar. 2012.

[50] W. Li, C. Chen, H. Su, and Q. Du, “Local binary patterns and extremelearning machine for hyperspectral imagery classification,” IEEE Trans.Geosci. Remote Sens., vol. 53, no. 7, pp. 3681–3693, Jul. 2015.

[51] L. Zhang et al., “Kernel sparse representation-based classifier,” IEEETrans. Signal Process., vol. 60, no. 4, pp. 1684–1695, Apr. 2012.

[52] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectralimage classification with independent component discriminant analysis,”IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876,Dec. 2011.

Wei Li (S’11–M’13) received the B.E. degree intelecommunications engineering from Xidian Uni-versity, Xi’an, China, in 2007; the M.S. degree in in-formation science and technology from Sun Yat-senUniversity, Guangzhou, China, in 2009; and thePh.D. degree in electrical and computer engineeringfrom Mississippi State University, Starkville, MS,USA, in 2012.

Subsequently, he spent one year as a PostdoctoralResearcher at the University of California, Davis,CA, USA. He is currently with the College of Infor-

mation Science and Technology, Beijing University of Chemical Technology,Beijing, China. His research interests include statistical pattern recognition,hyperspectral image analysis, and data compression.

Dr. Li is an active Reviewer for the IEEE TRANSACTIONS ON GEO-SCIENCE AND REMOTE SENSING, the IEEE GEOSCIENCE REMOTE SENS-ING LETTERS, and the IEEE JOURNAL OF SELECTED TOPICS IN APPLIED

EARTH OBSERVATIONS AND REMOTE SENSING (JSTARS). He is the recip-ient of the 2015 Best Reviewer Award from the IEEE Geoscience and RemoteSensing Society for his service for IEEE JSTARS.

Qian Du (S’98–M’00–SM’05) received the Ph.D.degree in electrical engineering from the Univer-sity of Maryland Baltimore County, Baltimore, MD,USA, in 2000.

Currently, she is the Bobby Shackouls Professorwith the Department of Electrical and ComputerEngineering, Mississippi State University, Starkville,MS, USA. Her research interests include hyperspec-tral remote sensing image analysis and applications,pattern classification, data compression, and neuralnetworks.

Dr. Du is a Fellow of SPIE-International Society for Optics and Photonics.She is the General Chair of the 4th IEEE Geoscience and Remote SensingSociety (GRSS) Workshop on Hyperspectral Image and Signal Processing:Evolution in Remote Sensing (WHISPERS) in Shanghai, China, in 2012.She served as the Cochair for the Data Fusion Technical Committee of theIEEE GRSS (2009–2013) and the Chair for the Remote Sensing and MappingTechnical Committee of the International Association for Pattern Recognition(2010–2014). She served as an Associate Editor for the IEEE JOURNAL

OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTESENSING, the Journal of Applied Remote Sensing, and the IEEE SIGNAL

PROCESSING LETTERS. Since 2016, she has been the Editor-in-Chief of theIEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS

AND REMOTE SENSING. She was the recipient of the 2010 Best ReviewerAward from the IEEE GRSS.

laplacian regularized collaborative graph for discriminant

Documents