spie proceedings [spie spie optical engineering + applications - san diego, california (sunday 21...

6
Deterministic matrices with the restricted isometry property Matthew Fickus a and Dustin G. Mixon b a Department of Mathematics and Statistics, Air Force Institute of Technology Wright-Patterson Air Force Base, Ohio 45433, USA b Program in Applied and Computational Mathematics, Princeton University Princeton, New Jersey 08544, USA ABSTRACT The state of the art in compressed sensing uses sensing matrices which satisfy the restricted isometry property (RIP). Unfortunately, the known deterministic RIP constructions fall short of the random constructions, which are only valid with high probability. In this paper, we consider certain deterministic constructions and compare different proof techniques that demonstrate RIP in the deterministic setting. Keywords: compressed sensing, restricted isometry property, deterministic constructions 1. INTRODUCTION In many applications, data is traditionally collected in massive quantities before employing a reasonable com- pression strategy. The result is a storage bottleneck that can be prevented with a data collection alternative known as compressed sensing. The philosophy behind compressed sensing is that we might as well target the meaningful data features up front instead of spending our storage budget on less-telling measurements. As an example, natural images tend to have a highly compressible wavelet decomposition because many of the wavelet coefficients are typically quite small. In this case, one might consider targeting large wavelet coefficients as desired image features; in fact, removing the contribution of the smallest wavelet coefficients will have little qualitative effect on the image, 5 and so using sparsity in this way is reasonable. Let x be an unknown N -dimensional vector with the property that at most K of its entries are nonzero. The goal of compressed sensing is to construct relatively few non-adaptive linear measurements along with a stable and efficient reconstruction algorithm that exploits this sparsity structure. Expressing each measurement as a row of an M × N matrix F , we have the following system: y = F x. (1) In the spirit of compressed sensing, we desire M N . Furthermore, in order for there to exist an inversion process for (1), we require that F maps K-sparse vectors injectively, or equivalently, that every subcollection of 2K columns of F is linearly independent. Certainly, matrices with this independence relation are common. In fact, this condition is satisfied with probability one by selecting the entries of F independently from the Gaussian distribution of zero mean and unit variance. Unfortunately, the natural reconstruction algorithm in this case, i.e., finding the sparsest approximation of y from the dictionary of columns of F , is known to be NP-hard. 10 Moreover, the independence requirement does not impose any sort of dissimilarity between columns of F , meaning distinct identity basis elements could lead to similiar measurements, thereby bringing instability in reconstruction. To get around the NP-hardness of sparse approximation, we need more structure in the matrix F . Instead of considering linear independence of all subcollections of 2K columns, it has become common to impose a much stronger requirement: that every submatrix of 2K columns of F be well-conditioned. Not only does this impose more structure on F , well-conditioning enables stability in reconstruction. To be explicit, we have the following definition: Send correspondence to Dustin G. Mixon: E-mail: [email protected] Wavelets and Sparsity XIV, edited by Manos Papadakis, Dimitri Van De Ville, Vivek K. Goyal, Proc. of SPIE Vol. 8138, 81380A · © 2011 SPIE · CCC code: 0277-786X/11/$18 doi: 10.1117/12.895080 Proc. of SPIE Vol. 8138 81380A-1 DownloadedFrom:http://proceedings.spiedigitallibrary.org/on04/28/2013TermsofUse:http://spiedl.org/terms

Upload: vivek-k

Post on 09-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Deterministic matrices with the restricted isometry property

Matthew Fickusa and Dustin G. Mixonb

aDepartment of Mathematics and Statistics, Air Force Institute of TechnologyWright-Patterson Air Force Base, Ohio 45433, USA

bProgram in Applied and Computational Mathematics, Princeton UniversityPrinceton, New Jersey 08544, USA

ABSTRACT

The state of the art in compressed sensing uses sensing matrices which satisfy the restricted isometry property(RIP). Unfortunately, the known deterministic RIP constructions fall short of the random constructions, whichare only valid with high probability. In this paper, we consider certain deterministic constructions and comparedifferent proof techniques that demonstrate RIP in the deterministic setting.

Keywords: compressed sensing, restricted isometry property, deterministic constructions

1. INTRODUCTION

In many applications, data is traditionally collected in massive quantities before employing a reasonable com-pression strategy. The result is a storage bottleneck that can be prevented with a data collection alternativeknown as compressed sensing. The philosophy behind compressed sensing is that we might as well target themeaningful data features up front instead of spending our storage budget on less-telling measurements. As anexample, natural images tend to have a highly compressible wavelet decomposition because many of the waveletcoefficients are typically quite small. In this case, one might consider targeting large wavelet coefficients asdesired image features; in fact, removing the contribution of the smallest wavelet coefficients will have littlequalitative effect on the image,5 and so using sparsity in this way is reasonable.

Let x be an unknown N -dimensional vector with the property that at most K of its entries are nonzero. Thegoal of compressed sensing is to construct relatively few non-adaptive linear measurements along with a stableand efficient reconstruction algorithm that exploits this sparsity structure. Expressing each measurement as arow of an M × N matrix F , we have the following system:

y = Fx. (1)

In the spirit of compressed sensing, we desire M � N . Furthermore, in order for there to exist an inversionprocess for (1), we require that F maps K-sparse vectors injectively, or equivalently, that every subcollectionof 2K columns of F is linearly independent. Certainly, matrices with this independence relation are common.In fact, this condition is satisfied with probability one by selecting the entries of F independently from theGaussian distribution of zero mean and unit variance. Unfortunately, the natural reconstruction algorithm inthis case, i.e., finding the sparsest approximation of y from the dictionary of columns of F , is known to beNP-hard.10 Moreover, the independence requirement does not impose any sort of dissimilarity between columnsof F , meaning distinct identity basis elements could lead to similiar measurements, thereby bringing instabilityin reconstruction.

To get around the NP-hardness of sparse approximation, we need more structure in the matrix F . Instead ofconsidering linear independence of all subcollections of 2K columns, it has become common to impose a muchstronger requirement: that every submatrix of 2K columns of F be well-conditioned. Not only does this imposemore structure on F , well-conditioning enables stability in reconstruction. To be explicit, we have the followingdefinition:

Send correspondence to Dustin G. Mixon: E-mail: [email protected]

Wavelets and Sparsity XIV, edited by Manos Papadakis, Dimitri Van De Ville, Vivek K. Goyal, Proc. of SPIE Vol. 8138, 81380A · © 2011 SPIE · CCC code: 0277-786X/11/$18

doi: 10.1117/12.895080

Proc. of SPIE Vol. 8138 81380A-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 04/28/2013 Terms of Use: http://spiedl.org/terms

Definition 1. We say the M × N matrix F satisfies the (K, δ)-restricted isometry property (RIP) if everyK-sparse vector x ∈ R

N satisfies(1 − δ)‖x‖2 ≤ ‖Fx‖2 ≤ (1 + δ)‖x‖2.

In words, matrices which satisfy RIP act as a near-isometry on sufficiently sparse vectors. Note that a(2K, δ)-RIP matrix with δ < 1 necessarily has that all subcollections of 2K columns are linearly independent.As mentioned before, the additional structure of RIP allows for the possibility of getting around the NP-hardnessof sparse approximation. Indeed, a significant result in compressed sensing is that RIP sensing matrices enableefficient reconstruction:

Theorem 2 (Candes-Tao4). Suppose an M ×N matrix F is (2K, δ)-RIP for some δ <√

2− 1. Then for everyK-sparse vector x ∈ R

N ,x = arg min ‖x‖1 subject to F x = Fx.

The fact that RIP sensing matrices convert an NP-hard reconstruction problem into an �1 minimizationproblem has prompted many in the community to construct RIP matrices. Of these constructions, the mostsuccessful have been random matrices, such as matrices with independent Gaussian or Bernoulli entries,2 ormatrices whose rows were randomly selected from the discrete Fourier transform matrix.11 With high probability,these random constructions support sparsity levels K on the order of M

logα N for some α ≥ 1. Intuitively, thislevel of sparsity is near-optimal because K cannot exceed M

2 by the linear independence condition, and further,a log factor in N must be paid to actually find the support of the sparse vector. Unfortunately, it is difficult tocheck whether a particular instance of a random matrix is (K, δ)-RIP, as this requires the calculation of singularvalues for all

(NK

)submatrices. For this reason, and for the sake of reliable sensing standards, many have become

interested in finding deterministic RIP matrix constructions. The purpose of this paper is to discuss certaindeterministic RIP matrix constructions and compare different proof techniques that demonstrate RIP in thedeterministic setting.

2. HOW TO DEMONSTRATE RIP

Take an M × N matrix F . For a given K, we wish to find some δ for which F is (K, δ)-RIP. To this end, it isuseful to note that the smallest δ for which F is (K, δ)-RIP is given by

δmin := maxK⊆{1,...,N}

|K|=K

‖F ∗KFK − IK‖2. (2)

Here, FK denotes the submatrix of columns of F indexed by K. Having an expression for the smallest permissibleδ value, note that we are not tasked with actually computing δmin. Rather, we recognize that F is (K, δ)-RIP forevery δ ≥ δmin, and so we seek an upper bound on δmin. To this end, we consider the following classical result:

Theorem 3 (Gershgorin circle theorem9). For each eigenvalue λ of a K × K matrix [aij ], there is an indexi ∈ {1, . . . , K} such that

|λ − aii| ≤K∑

j=1j �=i

|aij |.

To use this theorem, we first simplify the analysis by supposing that the columns of F have unit norm. Inreality, this affects very little, since we are merely imposing that F be optimally RIP for K = 1. Next, notethat F ∗

KFK is the Gram matrix of the columns indexed by K, and as such, the diagonal entries are 1, and theoff-diagonal entries are inner products between columns. Let μ denote the worst-case coherence of F = [f1 · · · fN ]:

μ := maxi,j∈{1,...,N}

i�=j

|〈fi, fj〉|.

Proc. of SPIE Vol. 8138 81380A-2

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 04/28/2013 Terms of Use: http://spiedl.org/terms

Then the size of each off-diagonal entry of F ∗KFK is ≤ μ, regardless of our choice for K. Therefore, for every

eigenvalue λ of F ∗KFK − IK , the Gershgorin circle theorem gives

|λ| = |λ − 0| ≤K∑

j=1j �=i

|〈fi, fj〉| ≤ (K − 1)μ. (3)

Since (3) holds for every eigenvalue λ of F ∗KFK − IK and every choice of K ⊆ {1, . . . , N}, we conclude from

(2) that δmin ≤ (K − 1)μ, i.e., F is (K, (K − 1)μ)-RIP. This process of using the Gershgorin circle theorem todemonstrate RIP for deterministic constructions has become standard in the community.1, 6, 8

Recall that random RIP constructions support sparsity levels K on the order of Mlogα N for some α ≥ 1. To

see how well the Gershgorin circle theorem demonstrates RIP, we need to somehow express μ in terms of M andN . To this end, we consider the following result, known as the Welch bound:

Theorem 4 (Welch bound14). Every M × N matrix with unit-norm columns has worst-case coherence

μ ≥√

N − M

M(N − 1).

To use this result, we consider matrices whose worst-case coherence achieves the Welch bound. These areknown as equiangular tight frames (ETFs),13 and they are characterized by the following properties:

• the columns have unit norm,

• the rows are orthogonal with equal norm, and

• the inner products between pairs of columns are equal in modulus.

To date, there are two general constructions that build several large families of ETFs.8, 15 Since ETFs achievethe Welch bound, we can further analyze what it means for an M × N ETF F to be (K, (K − 1)μ)-RIP. Inparticular, since Theorem 2 requires F to be (2K, δ)-RIP for δ <

√2− 1, it suffices to have 2K√

M<

√2 − 1, since

this implies

δ = (2K − 1)μ = (2K − 1)

√N − M

M(N − 1)≤ 2K√

M<

√2 − 1.

That is, ETFs form sensing matrices that support sparsity levels K on the order of√

M . Most other deterministicconstructions have identical bounds on sparsity levels.1, 6, 8 In fact, since ETFs minimize coherence, they arenecessarily optimal constructions in terms of the Gershgorin demonstration of RIP, but the question remainswhether they are actually RIP for larger sparsity levels—the Gershgorin demonstration is too weak to indicateeither possibility. This question illustrates the need for more tools to demonstrate whether a matrix is RIP; thefollowing section provides one such tool.

3. HOW TO DEMONSTRATE NOT RIP

Recall that, in order for an inversion process for (1) to exist, we require that F maps K-sparse vectors injectively,or equivalently, that every subcollection of 2K columns of F is linearly independent. This linear independencecondition can be expressed in more general terms, as the following definition provides:

Definition 5. The spark of a matrix F is the size of the smallest linearly dependent subset of columns, i.e.,

Spark(F ) = minx �=0

‖x‖0 s.t. Fx = 0.

Proc. of SPIE Vol. 8138 81380A-3

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 04/28/2013 Terms of Use: http://spiedl.org/terms

This definition was introduced by Dohono and Elad7 to help build a theory of sparse representation thatlater gave birth to modern compressed sensing. The concept of spark is also found in matroid theory, where itgoes by the name girth. The condition that every subcollection of 2K columns of F is linearly independent isequivalent to saying that Spark(F ) > 2K. Relating spark to RIP, suppose F is (K, δ)-RIP with Spark(F ) ≤ K.Then there exists a K-sparse vector x such that

(1 − δ)‖x‖2 ≤ ‖Fx‖2 = 0,

and so δ = 1. The reason behind this stems from our necessary linear independence condition: RIP implieslinear independence, and so small spark implies linear dependence, which in turn implies not RIP.

As an example of using spark to analyze RIP, we now consider a construction that dates back to Seidel,12

and was recently developed further.8 Here, a special type of block design is used to build an ETF. Let’s startwith a definition:

Definition 6. A (t, k, v)-Steiner system is a v-element set V with a collection of k-element subsets of V , calledblocks, with the property that any t-element subset of V is contained in exactly one block. The {0, 1}-incidencematrix A of a Steiner system has entries Aij , where Aij = 1 if the ith block contains the jth element, andotherwise Aij = 0.

One example of a Steiner system is a set with all possible two-element blocks. This forms a (2, 2, v)-Steinersystem because every pair of elements is contained in exactly one block. The following theorem details how toconstruct ETFs using Steiner systems.

Theorem 7 (Fickus-Mixon-Tremain8). Every (2, k, v)-Steiner system can be used to build a v(v−1)k(k−1) × v(1 + v−1

k−1 )equiangular tight frame F according the following procedure:

(i) Let A be the v(v−1)k(k−1) × v incidence matrix of a (2, k, v)-Steiner system.

(ii) Let H be the (1 + v−1k−1 ) × (1 + v−1

k−1 ) discrete Fourier transform matrix.

(iii) For each j = 1, . . . , v, let Fj be a v(v−1)k(k−1) × (1+ v−1

k−1 ) matrix obtained from the jth column of A by replacingeach of the one-valued entries with a distinct row of H, and every zero-valued entry with a row of zeros.

(iv) Concatenate and rescale the Fj ’s to form F = (k−1v−1 )

12 [F1 · · ·Fv].

As an example, we build an ETF from a (2,2,3)-Steiner system. In this case, the incidence matrix is

A =

⎣+ ++ +

+ +

⎦ .

For this matrix, each row represents a block. Since each block contains two elements, each row of the matrix hastwo ones. Also, any two elements determines a unique common row, and so any two columns have a single onein common. To form the corresponding 3× 9 ETF F , we use the 3× 3 DFT matrix. Letting ω = e2πi/3, we have

H =

⎣1 1 11 ω ω2

1 ω2 ω

⎦ .

Finally, we replace the two ones in each column of A with the second and third rows of H . Normalizing thecolumns gives 3 × 9 ETF:

F = 1√2

⎣1 ω ω2 1 ω ω2

1 ω2 ω 1 ω ω2

1 ω2 ω 1 ω2 ω

⎦ . (4)

Proc. of SPIE Vol. 8138 81380A-4

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 04/28/2013 Terms of Use: http://spiedl.org/terms

Several infinite families of (2, k, v)-Steiner systems are already known, and Theorem 7 says that each onecan be used to build an ETF. Recall from the previous section that Steiner ETFs, being ETFs, are optimalconstructions in terms of the Gershgorin demonstration of RIP. We now use the notion of spark to furtheranalyze Steiner ETFs. Specifically, note that the first three columns in (4) are linearly dependent. As such,Spark(F ) ≤ 3. In general, the spark of a Steiner ETF is ≤ v−1

k−1 ≤ √2M , and so having K on the order of

√M is

necessary for a Steiner ETF to be (K, δ)-RIP for some δ < 1. This answers the closing question of the previoussection: in general, ETFs are not RIP for sparsity levels larger than the order of

√M . This contrasts with

random constructions, which support sparsity levels as large as the order of Mlogα N for some α ≥ 1. That said,

are there techniques to demonstrate that a certain class of deterministic constructions is RIP for sparsity levelslarger than the order of

√M?

4. HOW TO BEAT GERSHGORIN RIP ANALYSIS

Notice that the Gershgorin circle theorem bounds the eigenvalues of a matrix in terms of the sizes of off-diagonalentries. Intuitively, any improvement to Gershgorin RIP analysis will need to account for cancellations in thesub-Gram matrices. In this spirit, Bourgain et al.3 demonstrated RIP with sparsity levels on the order of M1/2+ε

using the following technique:

Theorem 8 (Bourgain et al.3). Take an M × N matrix F = [f1 · · · fN ] with unit-norm columns and N ≥ M ≥K ≥ 210, and suppose that, for every disjoint I,J ⊆ {1, . . . , N} with |I|, |J | ≤ K, we have

∣∣∣∣∑

i∈I

j∈J〈fi, fj〉

∣∣∣∣ ≤ δ

√|I||J |.

Then F is (2SK, 44Sδ log K)-RIP for every positive integer S.

The above theorem is somewhat cumbersome because, like RIP, it requires one to check all sub-Gram matricesof size K or less. However, unlike RIP, it does not require one to calculate eigenvalues, and so Bourgain et al. wereable to use this result to demonstrate RIP for a particular construction using additive combinatorics. While theε improvement over the simpler Gershgorin analysis is modest, this work illustrates that the Gershgorin RIPanalysis is not the best one can do with deterministic constructions.

It is reasonable to suspect that there exist RIP demonstration techniques that produce better-than-ε im-provement over the Gershgorin RIP analysis. In pursuit of such techniques, we recall that the smallest δ forwhich F is (K, δ)-RIP is given by (2). In addition, we notice that, for any self-adjoint matrix A, we have

‖A‖2 = ‖σ(A)‖∞ = limp→∞ ‖σ(A)‖p,

where σ(A) denotes the spectrum of A. Let A = UDU∗ be the eigenvalue decomposition of A. Then when p iseven, we can express ‖σ(A)‖p in terms of an easy-to-calculate trace:

‖σ(A)‖pp = Tr[Dp] = Tr[(UDU∗)p] = Tr[Ap].

Combining these ideas leads to the following theorem:

Theorem 9. Given an M × N matrix F , define

δq :={

maxK⊆{1,...,N}

|K|=K

Tr[(F ∗KFK − IK)2q]

} 12q

.

Then F is (K, δq)-RIP for every q ≥ 1. Moreover, limq→∞ δq = δmin.

Proc. of SPIE Vol. 8138 81380A-5

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 04/28/2013 Terms of Use: http://spiedl.org/terms

Similar to Bourgain’s technique, this theorem is cumbersome in that it requires one to check every sub-Grammatrix of size K. However, Bourgain’s work gives hope that there may exist constructions to which this techniquecould be naturally applied. Moreover, we note that since δq approaches δmin, a sufficiently large choice of q shoulddeliver better-than-ε improvement over the Gershgorin RIP analysis. How large should q be? If we assume Fhas unit-norm columns, taking q = 1 gives

δ21 = max

K⊆{1,...,N}|K|=K

Tr[(F ∗KFK − IK)2] = max

K⊆{1,...,N}|K|=K

i,j∈Ki�=j

|〈fi, fj〉|2 ≤ K(K − 1)μ2. (5)

This inequality is achieved whenever F is an ETF, in which case (5) demonstrates that F is RIP with sparsitylevels on the order of

√M , as the Gershgorin analysis established. It remains to be shown how δ2 compares.

ACKNOWLEDGMENTS

This work was supported by the A.B. Krongard Fellowship. The views expressed in this article are those of theauthors and do not reflect the official policy or position of the United States Air Force, Department of Defense,or the U.S. Government.

REFERENCES[1] L. Applebaum, S.D. Howard, S. Searle and R. Calderbank, Chirp sensing codes: Deterministic compressed

sensing measurements for fast recovery, Appl. Comp. Harmon. Anal. 26 (2009) 283–290.[2] R. Baraniuk, M. Davenport, R. DeVore and M. Wakin, A simple proof of the restricted isometry property

for random matrices, Constr. Approx. 28 (2008) 253–263.[3] J. Bourgain, S. Dilworth, K. Ford, S. Konyagin and D. Kutzarova, Explicit constructions of RIP matrices

and related problems, Duke Math. J. 159 (2011) 145–185.[4] E.J. Candes and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 44 (2005) 4203–

4215.[5] M.A. Davenport, M.F. Duarte, Y.C. Eldar and G. Kutyniok, Introduction to compressed sensing, in:

Compressed sensing: Theory and applications, Y.C. Eldar and G. Kutyniok, eds., Cambridge UniversityPress, 2011.

[6] R.A. DeVore, Deterministic constructions of compressed sensing matrices, J. Complexity 23 (2007) 918–925.

[7] D.L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via�1 minimization, Proc. Nat. Acad. Sci. USA 100 (2003) 2197–2202.

[8] M. Fickus, D.G. Mixon and J.C. Tremain, Steiner equiangular tight frames, submitted, arXiv:1009.5730.[9] S. Gerschgorin, Uber die Abgrenzung der Eigenwerte einer Matrix, Izv. Akad. Nauk. USSR Otd. Fiz.-Mat.

7 (1931) 749–754.[10] B.K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24 (1995) 227–234.[11] M. Rudelson and R. Vershynin, On sparse reconstruction from Fourier and Gaussian measurements,

Comm. Pure Appl. Math. 61 (2008) 1025–1045.[12] J.J. Seidel, A survey of two-graphs, in: Proc. Intern. Coll. Teorie Combinatorie (1973) 481–511.[13] T. Strohmer and R.W. Heath, Grassmannian frames with applications to coding and communication,

Appl. Comp. Harmon. Anal. 14 (2003) 257–275.[14] L.R. Welch, Lower bounds on the maximum cross correlation of signals, IEEE Trans. Inform. Theory 20

(1974) 397–399.[15] P. Xia, S. Zhou and G.B Giannakis, Achieving the Welch bound with difference sets, IEEE Trans. In-

form. Theory 51 (2005) 1900–1907.

Proc. of SPIE Vol. 8138 81380A-6

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 04/28/2013 Terms of Use: http://spiedl.org/terms