robust pca via dictionary based outlier pursuit - sirisha rambhatla · 2020-05-26 · robust pca...

5
ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT Xingguo Li 1 , Jineng Ren 1 , Sirisha Rambhatla 1 , Yangyang Xu 2 and Jarvis Haupt 1 1 Department of Electrical and Computer Engineering 2 Department of Mathematical Sciences University of Minnesota Twin Cities Rensselaer Polytechnic Institute Email: {lixx1661, renxx282, rambh002, jdhaupt}@umn.edu Email: [email protected] ABSTRACT In this paper, we examine the problem of locating vector out- liers from a large number of inliers, with a particular focus on the case where the outliers are represented in a known basis or dictionary. Using a convex demixing formulation, we provide provable guarantees for exact recovery of the space spanned by the inliers and the supports of the outlier columns, even when the rank of inliers is high and the number of outliers is a constant proportion of total observations. Comprehensive numerical experiments on both synthetic and hyper-spectral imaging real datasets demonstrate the efficiency of our pro- posed method. Index Termsrobust PCA, outlier identification , hyper- spectral imaging 1. INTRODUCTION Suppose we observe a data matrix M R n1×n2 , which we assume admits a decomposition of the form: M = L + DC + N, (1) where D R n1×d is a known dictionary, L R n1×n2 is unknown, with rank(L)= r, C R d×n2 is an unknown but column-wise sparse matrix and N R n1×n2 is an un- known matrix modeling error. We denote the column sup- ports of C (the set of non-zero columns) as csupp(C)= I C and assume |I C | = k. In this model, we refer to the non- zero columns of DC as outliers; they are assumed to not be in the column space of L, denoted by U = span{col(L)}. Similarly, we call the rest of the columns of L as inliers, in- dexed by I L =[n 2 ]\I C with |I L | = n 2 - k = n L and [n 2 ]= {1,...,n 2 }. Given the data matrix M and the dic- tionary D, our specific goal is to recover the column space U and the indices of outliers I C . Model (1) can be viewed as a generalization of princi- pal component analysis (PCA) [1], where the goal is to es- timate a low dimensional embedding of given data, and its robust variants, where the data is contaminated by sparse out- liers [25]. The investigation of outlier identification is mo- tivated by a number of contemporary “big data” applications, where the outliers themselves may be of interest, such as iden- tifying malicious responses in collaborative filtering applica- tions [6], finding anomalous patterns in network traffic [7] or estimating visually salient regions of images [810]. More recently, there has been increasing interest in outlier identi- fication with known bases, motivated by real world applica- tions, e.g., functional magnetic resonance imaging [11], video processing [12], network tracking [13, 14], and hyper-spectral (HS) imaging [15, 16]. However, no rigorous analyses of iden- tifiability of the model (1) have been provided, motivating our investigation here. Our Approach: For any pair (L 0 , C 0 ), we say (L 0 , C 0 ) is in the oracle model {M, U , I C }, i.e., (L 0 , C 0 ) ∈{M, U , I C }, if P U (L 0 )= L 0 , P C (DC 0 )= DC 0 , and L 0 + DC 0 = L + DC hold simultaneously, where P U and P C are projections onto the column space U of L and column support I C of C, respectively. Given the data matrix M and the dictionary D, we aim to recover {U , I C } from noisy observations via the following optimization procedure, which we call Dictionary based Outlier Pursuit (DOP), min L,C kLk * + λkCk 1,2 s.t. kM - L - DCk F ε N , (2) where kLk * is the nuclear norm of L, kCk 1,2 = j kC :,j k 2 , C :,j is the j -th column of C, and λ 0 is a regularization parameter. Note that we cannot guarantee recovery of the true parameter pair (L, C) even when N = 0, since there exists ambiguity that the outlier columns (non-zero columns of DC) may contain “energy” of some inliers. Specifically, P U (DC) 6= 0 in general, where U is the orthogonal complement of U in R n1 and P U is the projection onto U . When P U (DC) 6= 0 we can always find (L 1 , C 1 ), (L 2 , C 2 ) ∈{M, U , I C } with (L 1 , C 1 ) 6=(L 2 , C 2 ). The main contribution of this paper is that we provide suf- ficient conditions for the convex optimization (2) to enable the recovery of the column space U of L, and the identities I C of the outliers, even when rank(L) and k are large. Exact recov- ery can be guaranteed when N = 0. Background: A closely related model is studied in [3], which estimates (U , I C ) in the case D = I, using a convex formu- lation termed Outlier Pursuit (OP). Simply multiplying the (pseudo) inverse D of D on both sides of (1) and apply OP do not work here in general. For example, when the subspace spanned by D do not contain U , such an operation results in the column space of D L ⊂U , so we may not recover 4699 978-1-5386-4658-8/18/$31.00 ©2018 IEEE ICASSP 2018

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust PCA via Dictionary Based Outlier Pursuit - Sirisha Rambhatla · 2020-05-26 · ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT Xingguo Li 1, Jineng Ren 1, Sirisha Rambhatla

ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT

Xingguo Li1, Jineng Ren1, Sirisha Rambhatla1, Yangyang Xu2 and Jarvis Haupt1

1Department of Electrical and Computer Engineering 2Department of Mathematical SciencesUniversity of Minnesota Twin Cities Rensselaer Polytechnic Institute

Email: {lixx1661, renxx282, rambh002, jdhaupt}@umn.edu Email: [email protected]

ABSTRACTIn this paper, we examine the problem of locating vector out-liers from a large number of inliers, with a particular focus onthe case where the outliers are represented in a known basis ordictionary. Using a convex demixing formulation, we provideprovable guarantees for exact recovery of the space spannedby the inliers and the supports of the outlier columns, evenwhen the rank of inliers is high and the number of outliers isa constant proportion of total observations. Comprehensivenumerical experiments on both synthetic and hyper-spectralimaging real datasets demonstrate the efficiency of our pro-posed method.

Index Terms— robust PCA, outlier identification , hyper-spectral imaging

1. INTRODUCTION

Suppose we observe a data matrix M ∈ Rn1×n2 , which weassume admits a decomposition of the form:

M = L + DC + N, (1)where D ∈ Rn1×d is a known dictionary, L ∈ Rn1×n2 isunknown, with rank(L) = r, C ∈ Rd×n2 is an unknownbut column-wise sparse matrix and N ∈ Rn1×n2 is an un-known matrix modeling error. We denote the column sup-ports of C (the set of non-zero columns) as csupp(C) = ICand assume |IC| = k. In this model, we refer to the non-zero columns of DC as outliers; they are assumed to not bein the column space of L, denoted by U = span{col(L)}.Similarly, we call the rest of the columns of L as inliers, in-dexed by IL = [n2]\IC with |IL| = n2 − k = nL and[n2] = {1, . . . , n2}. Given the data matrix M and the dic-tionary D, our specific goal is to recover the column space Uand the indices of outliers IC.

Model (1) can be viewed as a generalization of princi-pal component analysis (PCA) [1], where the goal is to es-timate a low dimensional embedding of given data, and itsrobust variants, where the data is contaminated by sparse out-liers [2–5]. The investigation of outlier identification is mo-tivated by a number of contemporary “big data” applications,where the outliers themselves may be of interest, such as iden-tifying malicious responses in collaborative filtering applica-tions [6], finding anomalous patterns in network traffic [7] or

estimating visually salient regions of images [8–10]. Morerecently, there has been increasing interest in outlier identi-fication with known bases, motivated by real world applica-tions, e.g., functional magnetic resonance imaging [11], videoprocessing [12], network tracking [13,14], and hyper-spectral(HS) imaging [15,16]. However, no rigorous analyses of iden-tifiability of the model (1) have been provided, motivating ourinvestigation here.

Our Approach: For any pair (L0,C0), we say (L0,C0) is inthe oracle model {M,U , IC}, i.e., (L0,C0) ∈ {M,U , IC},if PU (L0) = L0, PC(DC0) = DC0, and L0 + DC0 = L +DC hold simultaneously, where PU and PC are projectionsonto the column space U of L and column support IC of C,respectively. Given the data matrix M and the dictionary D,we aim to recover {U , IC} from noisy observations via thefollowing optimization procedure, which we call Dictionarybased Outlier Pursuit (DOP),

minL,C‖L‖∗ + λ‖C‖1,2 s.t. ‖M− L−DC‖F ≤ εN, (2)

where ‖L‖∗ is the nuclear norm of L, ‖C‖1,2 =∑j ‖C:,j‖2,

C:,j is the j-th column of C, and λ ≥ 0 is a regularizationparameter. Note that we cannot guarantee recovery of thetrue parameter pair (L,C) even when N = 0, since thereexists ambiguity that the outlier columns (non-zero columnsof DC) may contain “energy” of some inliers. Specifically,PU⊥(DC) 6= 0 in general, where U⊥ is the orthogonalcomplement of U in Rn1 and PU⊥ is the projection ontoU⊥. When PU⊥(DC) 6= 0 we can always find (L1,C1),(L2,C2) ∈ {M,U , IC} with (L1,C1) 6= (L2,C2).

The main contribution of this paper is that we provide suf-ficient conditions for the convex optimization (2) to enable therecovery of the column space U of L, and the identities IC ofthe outliers, even when rank(L) and k are large. Exact recov-ery can be guaranteed when N = 0.

Background: A closely related model is studied in [3], whichestimates (U , IC) in the case D = I, using a convex formu-lation termed Outlier Pursuit (OP). Simply multiplying the(pseudo) inverse D† of D on both sides of (1) and apply OPdo not work here in general. For example, when the subspacespanned by D do not contain U , such an operation resultsin the column space of D†L ⊂ U , so we may not recover

4699978-1-5386-4658-8/18/$31.00 ©2018 IEEE ICASSP 2018

Page 2: Robust PCA via Dictionary Based Outlier Pursuit - Sirisha Rambhatla · 2020-05-26 · ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT Xingguo Li 1, Jineng Ren 1, Sirisha Rambhatla

U . We may also not recover IC since nonzero columns of Cmay be in U . In addition, the prior knowledge on D enablesenhanced performance of recovery, especially when rank(L)is high. Another related model is studied in [14, 16], whichestimates (U , IC) of (1) with C being an entry-wise sparsematrix. However, as indicated by our experiments, when theoutlier does have a column-wise structure, our estimator ob-tained from (2) is more robust than the counterpart of entry-wise sparse C for large k and r.

Notation. For a low rank matrix L, we denote L = UΣV>

as the compact singular value decomposition (SVD) of L,where columns of U ∈ Rn1×r and V ∈ Rn2×r are leftand right singular vectors of L respectively, i.e., U>U =V>V = Ir×r, and Σ ∈ Rr×r is a diagonal matrix withΣii = σi being the i-th singular value of L, for i ∈ [r].Given a matrix X, the projection operations are defined asPU (X) = PUX and PV(X) = XPV, where PU = UU>

and PV = VV> are column and row projection matrices re-spectively, PL(X) = (PU + PV − PUPV)(X) = PUX +XPV − PUXPV, and PC(X) is obtained by keeping thei-th column of X unchanged for i ∈ IC, otherwise settingthe i-th column of X to be zero for i /∈ IC. The com-plement of the operations are defined correspondingly, i.e.,PU⊥(X) = (I−PU)X, PV⊥(X) = X(I−PV), PL⊥(X) =(I−PU)X(I−PV), and PC⊥(X) is obtained by keeping thei-th column of X unchanged for i /∈ IC, otherwise setting thei-th column of X to be zero for i ∈ IC.

2. PRELIMINARIES

We define several subspaces, similar to those in [14, 16]:(s1) L = {X ∈ Rn1×n2 : X = UW>

1 + W2V>, W1 ∈

Rn2×r, W2 ∈ Rn1×r}, the span of all matrices withthe same column or row space of L,

(s2) C ={H ∈ Rd×n2 for any d ∈ N : csupp(H) ⊆ IC

},

the matrices with column support contained in the col-umn support of C and

(s3) D = {X ∈ Rn1×n2 : X = DH, H ∈ C}, all matriceswith the column space as a column subspace of D.

We denote (L, C,D) as the subspaces (s1) ∼ (s3).In the following, we introduce some conditions on the

local identifiability used throughout our analysis. We firstdefine the subspace incoherence property between two sub-spaces L and D to quantify the degree of their overlap. Thisis formalized as follows.

Definition 2.1 (Subspace Incoherence Property). Two sub-spaces L and D are said to satisfy the subspace incoherenceproperty with parameter µ(L,D) when

maxX∈D\{0}

‖PL(X)‖F‖X‖F

≤ µ(L,D). (3)

Note that µ(L,D) ∈ [0, 1], where the upper bound isachieved when PU⊥(DC:,j) = 0 for some j ∈ IC, and the

lower bound is achieved when L andD are orthogonal. In ourproblem, we are interested in µ(L,D) < 1, which indicatesthat PU⊥(DC:,j) > 0 for all j ∈ IC. This condition playsan important role in guaranteeing the uniqueness of {U , IC}given M and D.

Another important property is a criterion for the dictio-nary matrix to preserve the Euclidean norm of any fixed vec-tor in a certain space, which is formalized as follows.

Definition 2.2 (Restricted Frame Property). An n1×d matrixD is said to satisfy the restricted frame property on x ∈ RC

if for any fixed x ∈ RC,α`‖x‖22 ≤ ‖Dx‖22 ≤ αu‖x‖22, (4)

where αu and α` are upper and lower bounds respectivelywith αu ≥ α` > 0.

The restricted frame property (RFP) is a fairly genericproperty that is satisfied by many deterministic and randommatrices (e.g., with zero-mean Gaussian or subgaussian en-tries). We do not restrict D to be overcomplete or to haveorthogonal rows as in [14], which allows for a much broaderchoices of the dictionary. In fact, RC can be simply Rdwhen n1 ≥ d, thus the frame property is easy to meet whenD is an undercomplete dictionary and better recovery per-formance can be guaranteed, as shown in experiments. Inthe case n1 < d, it is equivalent to the popular restrictedisometry property (RIP) [17,18] on sparse input, where givenε ∈ (0, 1), we have that for sparse vectors x with ||x||0 ≤ kfor some k ≤ n1,

(1− ε)‖x‖22 ≤ ‖Dx‖22 ≤ (1 + ε)‖x‖22.Generally, RFP holds for any vector in a subspace that hassmall enough principal angles with the row space R(D) ofD, for which there exists a constant ε ∈ (0, 1] such that

min

{arccos

(|〈u,v〉|‖u‖2‖v‖2

) ∣∣∣∣ u ∈ RC,v ∈ R(D)

}≥ ε.

Note that α` > 0 in (4) indicates that for any H ∈C\0d×n2

, DH 6= 0n1×n2. This is another important result

for the optimality condition we will address later.We also define several constants for convenience:

βV = ‖VV>‖∞,2, βU,V = ‖D>UV>‖∞,2,where ‖A‖∞,2 = maxi ‖A:,i‖2. Small values of βU,V in-dicates each column of L is spanned by “sufficiently many”columns of D. In addition, βV is related to the notion ofcolumn incoherence property (also called leverage score).Specifically, let L ∈ Rn1×n2 be a rank r matrix with nL ≤ n2non-zero columns. Given the compact SVD L = UΣV>, Lis said to satisfy the column incoherence property with pa-rameter µV if ‖V>‖2∞,2 ≤ µV

rnL

. Such quantities have beenidentified as important for the identifiability of low rank com-ponents in previous works [2, 3, 19]. Note that µV ∈ [1, nL

r ],and a small µV indicates that the vectors comprising columnsof L are “spread out” among the basis vectors spanning thecolumn space of L, or equivalently, V does not contain sparserows. Consequently, we have β2

V ≤ µVrnL

.

4700

Page 3: Robust PCA via Dictionary Based Outlier Pursuit - Sirisha Rambhatla · 2020-05-26 · ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT Xingguo Li 1, Jineng Ren 1, Sirisha Rambhatla

3. THEORETICAL GUARANTEES

We provide sufficient conditions for DOP (2) to guarantee ac-curate recovery of the column space U of L and the identitiesIC of non-zero columns of C. We will first provide the theoryand then the corresponding optimality conditions.

3.1. Recovery for DOPThe guarantees for the estimation of {U , IC} via DOP areprovided as the following theorem.

Theorem 3.1. Suppose M = L + DC + N with (L,C)belonging to the oracle model {M,U , IC}, ‖N‖F ≤ εN,rank(L) = r, and |IC| = k with k satisfying k ≤ 1

4β2V

. Sup-pose subspacesL andD satisfy (3) with parameter µ(L,D) ∈[0, 1), D satisfies (4) on RC with αu ≥ α` > 0, and C:,j ∈RC for all j ∈ [n2]. If λ, r and k satisfy

βU,V +√rkαuµ(L,D)b2

12 − kb2

≤ λ ≤b12 −√rαuµ(L,D)√k

, (5)

where b1 and b2 are defined as

b1 =√α`(1− µ(L,D)), and b2 = max

‖u‖=1

‖(I−PU)Du‖2‖Du‖2

αuβ2V

α`(1−µ(L,D))2,

then there exists (L, C) ∈ {M,U , IC} such that the optimalsolution (L, C) of NDOP in (2) satisfies

‖L− L‖F ≤(8√r + 9

√rαu

λ

)εN,

‖C− C‖F ≤ 9√r(1 +

√αu

λ

)εN. (6)

We interpret the condition (5) for λ, r and k to have betterinsights of the results. Denote a . b if a ≤ c · b for someconstant c and a ≈ b if a . b and b . a hold simultaneously.Suppose 1 . α` ≤ αu . 1, which can be easily met by a tightframe when n1 > d, or a RIP type condition when n1 < d.Note that βV ≈ µVr

nL. Then if µ(L,D) . 1

r and βU,V .1r (satisfied when DC and L has small coherence), we havek = O( nL

r·µV) and 1

k . λ . 1√k

. This is of the same orderwith the upper bound of k in OP [3], but our experiments inSection 4.1 show that DOP outperforms OP even when therank of L is high. Note that when the noise N = 0, (6)implies exact recovery.

3.2. Proof SketchThe Lagrangian of the problem (2) can be written as

F(L,C,U) = ‖L‖∗ + λ‖C‖1,2 + 〈A,M− L−DC〉, (7)

where A ∈ Rn1×n2 is a dual variable. The subdifferentials of(7) with respect to (L,C) are

∂LF(L,C,U) = {UV> + W −A : ‖W‖2 ≤ 1,PL(W) = 0},∂CF(L,C,U) = {λH + λZ−D>A : PC(H) = H,

H:,j =C:,j

‖C:,j‖2 ,PC(Z) = 0, ‖Z‖∞,2 ≤ 1}.

We claim that a pair (L,C) is an optimal point of (2) if andonly if the following hold by the optimality conditions:

0n1×n2∈ ∂LF(L,C,U), 0d×n2

∈ ∂CF(L,C,U).

We then construct the dual certificate as follows. Let thecompact SVD of L be L = UΣV>, D = (I−PV) ⊗D> (I−PU), DC be the row submatrix of D correspondingto the non-zero rows of vec(C), Y = λC − PC(D>UV>),and YC be column submatrix of Y indexed by IC . If λ, r,and k satisfy (5), then we construct the dual certificate Q as

Q = UV> + (I−PU) X (I−PV) , (8)

where X is given by vec(X) = D>C (DCD>C )−1vec(YC). The

following lemma states the optimality conditions for the opti-mal solution pair (L,C), which proves Theorem 3.1.

Lemma 3.1. Suppose all conditions in Theorem 3.1 hold.Then the construction of dual certificate Q in (8) satisfies

(q1) PL(Q) = UV>, (q2) ‖PL⊥(Q)‖2 < 12 ,

(q3) PC(D>Q) = λC, where C:,j =C:,j

‖C:,j‖2 for all j ∈IC; 0 otherwise, (q4) ‖PC⊥(D>Q)‖∞,2 < λ

2 .

Moreover, if the noise satisfies ‖N‖F ≤ εN, then there exists(L, C) ∈ {M,U , IC} such that the optimal solution (L, C)of DOP satisfies (6).

4. NUMERICAL EVALUATION

We provide numerical experiments to study the properties ofDOP. We accomplish this by first studying the phase transi-tion in terms of rank and sparsity in Section 4.1, followedby exploration of an application of the proposed approach fortarget detection in hyper-spectral images in Section 4.2. Thecompeting methods are OP proposed by [3] and a naive proce-dure multiplying the pseudo inverse of D on both sides of (1)then applying OP, called Inv+OP when the dictinary is thin.Note that DOP and Inv+OP can be considered as a type of su-pervised learning method [20]. However, different from mostsupervised learning methods that need to train implicit modelparameters, such as the support vector machines (SVMs) [21],we can detect outliers directly from a given data, i.e., the ba-sis for outliers or a dictionary formed from outliers. Any dic-tionary learning method or clustering method can be used toform the dictionary. We also test on the popular matched fil-ter [22], which performs uniformly worse than DOP, thus weomit its result here.

4.1. Phase TransitionFor ease of exploration, we only discuss the noiseless case,i.e., N = 0. We demonstrate the phase transition with respectto the rank r and the number of outliers k, comparing DOPwith OP by setting M = L + C and Inv-OP. Specifically,for DOP, we set n1 = 100, n2 = 1000, d = 50 or 150, and

4701

Page 4: Robust PCA via Dictionary Based Outlier Pursuit - Sirisha Rambhatla · 2020-05-26 · ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT Xingguo Li 1, Jineng Ren 1, Sirisha Rambhatla

200 400 600 800 1000

100

80

60

40

20

k

r

(a) OP

r

k

20

40

60

80

100

1000800600400200

(b) Inv+OP

200 400 600 800 1000

100

80

60

40

20

k

r

(c) DOP

200 400 600 800 1000

100

80

60

40

20

k

r

(d) DOP

Fig. 1. Phase transitions for (a) OP, (b) Inv+OP, and DOP with (c)d = 50; (d) d = 150.

choose r ∈ {5, 10, . . . , 100} and k ∈ {50, 100, . . . , 1000}with λ = 0.5 for d = 50 and λ = 1.5 for d = 150. For eachpair of r and k, we generate L = [UV>0n1×k] ∈ Rn1×n2 ,C = [0d×(n2−k)W] ∈ Rd×n2 , where U ∈ Rn1×r, V ∈R(n2−k)×r and W ∈ Rd×k has i.i.d. N (0, 1) entries. Wealso generate D ∈ Rn1×d with i.i.d. N (0, 1) entries, set M =L + DC, and normalized it by column. For OP, we generateL ∈ Rn1×n2 and C ∈ Rn1×n2 in the same way except thatC = DW with d = 50 such that columns of C spans a 50-dimensional subspace of R100.

We demonstrate the average performace over 50 randomtrials for each of the methods considered in Figure 1. Here,we deem an experiment to be a success (shown in white) ifit identifies outlier and inlier locations correctly. We observethat for DOP, even when L has full row rank, we can recover{U , IC} exactly for a wide range of k. This coincides withboth our theory and intuition that the prior knowledge on Dallows for the recovery of wider range of r and k when theproblem is properly defined. On the other hand, with OP, therecovery of {U , IC} fails when the rank r is high, even forvery small k. This matches with the bound that k = O( n2

r·µV).

For Inv+OP, it can also recover IC for a range of k whenL has full row rank in this setting. However, when D doesnot contain U , i.e., ‖PD(L)‖F < ‖L‖F, Inv+OP will loseinformation of U . In the extreme case, PD(L) = 0 if D ⊥ U .

4.2. Target Detection in Hyper-Spectral Imaging DataWe investigate the applicability of our approach for a targetdetection application in hyper-spectral (HS) imaging. A HSsensor records the response of a scene to different regions ofthe electromagnetic spectrum. Therefore, the resulting HSimage, Y ∈ Rs×m×w can be viewed as a data cube i.e., atensor, where each length w voxel corresponds to the spectralresponse of associated pixel. The aim here is to identify tar-gets in an HS image given the spectral signatures of the targetsof interest. Target detection in hyper-spectral images was thetopic of one of our previous works [23], where we considerthe case of entry-wise sparsity. Here, we analyze the per-formance of DOP for this application. For our experiments,we consider 2 datasets: (i) Indian Pines collected by AVIRISsensor [24] with s = m = 145 and w = 200; and (ii) PaviaUniversity collected by ROSIS sensor 1 with s = m = 131

1Data is available at http://www.ehu.eus/ccwintco/

(a) (b) GT (c) OP (d) Inv+OP (e) DOP

(f) (g) GT (h) OP (i) Inv+OP (j) DOP

Fig. 2. Demonstration of (a) a slice of Indian Pines HS data array(with w = 50) and (f) a slice of Pavia University HS data array (withw = 100). We also show (b, g) the ground truth, (c, h) detectionresults of OP, (d, i) Inv + OP, and (e, j) DOP for Indian Pines andPavia University.

Table 1. Comparison of the ROC metrics for different methods.

Approachd = 4 d = 15

TPR FPR AUC TPR FPR AUCDOP 0.989 0.012 0.998 0.989 0.017 0.998

Inv + OP 0.926 0.033 0.980 0.903 0.005 0.946OP 0.097 0.024 0.095 0.097 0.024 0.095

and w = 201.The data matrix M ∈ Rw×sm is formed by unfolding the

tensor data Y along the third dimension, where each columnof M is the voxel of Y . For our experiments, we form thedictionary D in two ways – by randomly choosing some vox-els from the class of interest, and by learning a dictionary onthe class of interest [25] (we skip the description here), whichperforms better in our experiments.

We demonstrate the results of different approaches,namely DOP, OP, and Inv+OP in Figure 2. The light colorhere corresponds to the detected targets. For all approaches,we provide the detection result with optimal “visual” detec-tion results compared with ground truth. The correspondingdetailed results are shown in Table 1. Here, we report theperformance in terms of the ROC metrics, i.e., true positiverate (TPR), false positive rate (FPR), and area under curve(AUC) for each approach for two types of dictionaries – whendictionary is learned (d=4), and when the dictionary is formedfrom the data voxels directly (d=15). The results show thatthe proposed method performs better than Inv+OP and OP.

5. DISCUSSION

Further improvement in terms of the sampling and compu-tational efficiency can be achieved via adaptive sensing andsketching [26, 27]. For example, a two-step procedure [28]can be applied to reduce both numbers of the rows andcolumns in the optimization phase for further speedup. Wewill leave this investigation to a future effort.

Acknowledgment. The authors acknowledge support fromthe DARPA Young Faculty Award, Grant N66001-14-1-4047.

4702

Page 5: Robust PCA via Dictionary Based Outlier Pursuit - Sirisha Rambhatla · 2020-05-26 · ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT Xingguo Li 1, Jineng Ren 1, Sirisha Rambhatla

6. REFERENCES

[1] I. Jolliffe, Principal component analysis, Wiley Online Li-brary, 2002.

[2] E. Candes, X. Li, Y. Ma, and J. Wright, “Robust principalcomponent analysis,” J. ACM, vol. 58, no. 3, pp. 11:1–11:37,2011.

[3] H. Xu, C. Caramanis, and S. Sanghavi, “Robust PCA via out-lier pursuit,” IEEE Trans. Inform. Theory, vol. 58, no. 5, pp.3047–3064, 2012.

[4] Y. Chen, H. Xu, C. Caramanis, and S. Sanghavi, “Matrixcompletion with column manipulation: Near-optimal sample-robustness-rank tradeoffs,” IEEE Trans. on Inform. Theory,vol. 62, no. 1, pp. 503–526, 2016.

[5] J. Ren, X. Li, and J. Haupt, “Robust PCA via tensor outlier pur-suit,” in Signals, Systems and Computers, 2016 50th AsilomarConference on. IEEE, 2016, pp. 1744–1749.

[6] B. Mehta and W. Nejdl, “Attack resistant collaborative filter-ing,” in Proceedings of the 31st Annual International ACMSIGIR Conference on Research and Development in Informa-tion Retrieval. ACM, 2008, pp. 75–82.

[7] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing network-wide traffic anomalies,” in ACM SIGCOMM Computer Com-munication Review. ACM, 2004, vol. 34, pp. 219–230.

[8] L. Itti, C. Koch, and E. Niebur, “A model of saliency-basedvisual attention for rapid scene analysis,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.

[9] J. Harel, C. Koch, and P. Perona, “Graph-based visualsaliency.,” in Advances in Neural Information Processing Sys-tems, 2006, pp. 545–552.

[10] T. Liu, J. Sun, N. Zheng, X. Tang, and H. Shum, “Learning todetect a salient object,” in Proc. CVPR, 2007.

[11] C. Qiu and N. Vaswani, “Recursive sparse recovery in large butcorrelated noise,” in 49th Annual Allerton Conference on Com-munication, Control, and Computing (Allerton). IEEE, 2011,pp. 752–759.

[12] A. Waters, A. Sankaranarayanan, and R. Baraniuk, “Sparcs:Recovering low-rank and sparse matrices from compressivemeasurements,” in Advances in Neural Information ProcessingSystems, 2011, pp. 1089–1097.

[13] Y. Zhang, Z. Ge, A. Greenberg, and M. Roughan, “Networkanomography,” in Proceedings of the 5th ACM SIGCOMMConference on Internet Measurement. USENIX Association,2005, pp. 30–30.

[14] M. Mardani, G. Mateos, and G. Giannakis, “Recovery of low-rank plus compressed sparse matrices with application to un-veiling traffic anomalies,” IEEE Trans. on Inform. Theory, vol.59, no. 8, pp. 5186–5205, 2013.

[15] Chein-I Chang, Hyperspectral imaging: techniques for spec-tral detection and classification, vol. 1, Springer Science&amp; Business Media, 2003.

[16] S. Rambhatla, X. Li, and J. Haupt, “A dictionary based gen-eralization of robust PCA,” in IEEE Global Conference onSignal and Information Processing (GlobalSIP). IEEE, 2016,pp. 1315–1319.

[17] A. Gilbert, J. Park, and M. Wakin, “Sketched SVD: Recover-ing spectral features from compressive measurements,” arXivpreprint:1211.0361, 2012.

[18] D. Achlioptas, “Database-friendly random projections,”in Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM,2001, pp. 274–281.

[19] E. Candes and B. Recht, “Exact matrix completion via con-vex optimization,” Foundations of Computational Mathemat-ics, vol. 9, no. 6, pp. 717–772, 2009.

[20] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations ofMachine Learning, MIT press, 2012.

[21] C. Cortes and V. Vapnik, “Support-vector networks,” MachineLearning, vol. 20, no. 3, pp. 273–297, 1995.

[22] L. Scharf, Statistical Signal Processing, vol. 98, Addison-Wesley Reading, MA, 1991.

[23] S. Rambhatla, X. Li, and J. Haupt, “Target-based hyperspectraldemixing via generalized robust PCA,” in Fifty First AsilomarConference on Signals, Systems and Computers. IEEE, 2017.

[24] M. Baumgardner, L. Biehl, and D. Landgrebe, “220 bandAVIRIS hyperspectral image data set: June 12, 1992 indianpine test site 3,” Sep 2015.

[25] J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. Bach, “Su-pervised dictionary learning,” in Advances in Neural Informa-tion Processing Systems, 2009, pp. 1033–1040.

[26] E. Arias-Castro, E. Candes, and M. Davenport, “On the fun-damental limits of adaptive sensing,” IEEE Transactions onInformation Theory, vol. 59, no. 1, pp. 472–481, 2013.

[27] D. Woodruff, “Sketching as a tool for numerical linear alge-bra,” Foundations and Trends® in Theoretical Computer Sci-ence, vol. 10, no. 1–2, pp. 1–157, 2014.

[28] X. Li and J. Haupt, “Identifying outliers in large matrices viarandomized adaptive compressive sampling,” Trans. SignalProcessing, vol. 63, no. 7, pp. 1792–1807, 2015.

4703