model-based compressive sensing - rice universityduarte/images/modelcs082008.pdf · model-based...

Model-Based Compressive Sensing

Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, Chinmay Hegde

Department of Electrical and Computer Engineering

Rice University

Abstract

Compressive sensing (CS) is an alternative to Shannon/Nyquist sampling for acquisition of sparse or

compressible signals that can be well approximated by justK ≪ N elements from anN -dimensional

basis. Instead of taking periodic samples, we measure innerproducts withM < N random vectors

and then recover the signal via a sparsity-seeking optimization or greedy algorithm. The standard CS

theory dictates that robust signal recovery is possible from M = O (K log(N/K)) measurements. The

goal of this paper is to demonstrate that it is possible to substantially decreaseM without sacrificing

robustness by leveraging more realistic signal models thatgo beyond simple sparsity and compressibility

by including dependencies between values and locations of the signal coefficients. We introduce a model-

based CS theory that parallels the conventional theory and provides concrete guidelines on how to create

model-based recovery algorithms with provable performance guarantees. A highlight is the introduction

of a new class of model-compressible signals along with a newsufficient condition for robust model-

compressible signal recovery that we dub the restricted amplification property (RAmP). The RAmP is

the natural counterpart to the restricted isometry property (RIP) of conventional CS. To take practical

advantage of the new theory, we integrate two relevant signal models — wavelet trees and block sparsity

— into two state-of-the-art CS recovery algorithms and prove that they offer robust recovery from just

M = O (K) measurements. Extensive numerical simulations demonstrate the validity and applicability

of our new theory and algorithms.

Index Terms

Compressive sensing, sparsity, signal model, union of subspaces, wavelet tree, block sparsity

The authors are listed alphabetically. Email:richb, volkan, duarte, [email protected]; Web: dsp.rice.edu/cs. This work

was supported by the grants NSF CCF-0431150 and CCF-0728867, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1-0936

and N00014-08-1-1112, AFOSR FA9550-07-1-0301, ARO MURI W311NF-07-1-0185, and the Texas Instruments Leadership

University Program.

I. INTRODUCTION

We are in the midst of a digital revolution that is enabling the development and deployment

of new sensors and sensing systems with ever increasing fidelity and resolution. The theoretical

foundation is the Shannon/Nyquist sampling theorem, whichstates that a signal’s information is

preserved if it is uniformly sampled at a rate at least two times faster than its Fourier bandwidth.

Unfortunately, in many important and emerging applications, the resulting Nyquist rate can be

so high that we end up with too many samples and must compress in order to store or transmit

them. In other applications the cost of signal acquisition is prohibitive, either because of a high

cost per sample, or because state-of-the-art samplers cannot achieve the high sampling rates

required by Shannon/Nyquist. Examples include radar imaging and exotic imaging modalities

outside visible wavelengths.

Transform compression systems reduce the effective dimensionality of anN-dimensional

signal x by re-representing it in terms of a sparse set of coefficientsα in a basis expansion

x = Ψα, with Ψ an N × N basis matrix. By sparse we mean that onlyK ≪ N of the

coefficientsα are nonzero and need to be stored or transmitted. By compressible we mean that

the coefficientsα, when sorted, decay rapidly enough to zero thatα can be well-approximated as

K-sparse. The sparsity and compressibility properties are pervasive in many signals of interest.

For example, smooth signals and images are compressible in the Fourier basis, while piecewise

smooth signals and images are compressible in a wavelet basis [1]; the JPEG and JPEG2000

standards are examples of practical transform compressionsystems based on these bases.

Compressive sensing(CS) provides an alternative to Shannon/Nyquist sampling when the

signal under acquisition is known to be sparse or compressible [2–4]. In CS, we measure

not periodic signal samples but rather inner products withM ≪ N measurement vectors. In

matrix notation, the measurementsy = Φx = ΦΨα, where the rows of theM × N matrix

Φ contain the measurement vectors. While the matrixΦΨ is rank deficient, and hence loses

information in general, it can be shown to preserve the information in sparse and compressible

signals if it satisfies the so-calledrestricted isometry property(RIP) [3]. Intriguingly, a large

2

class of random matrices have the RIP with high probability.To recover the signal from the

compressive measurementsy, we search for the sparsest coefficient vectorα that agrees with

the measurements. To date, research in CS has focused primarily on reducing both the number

of measurementsM (as a function ofN andK) and on increasing the robustness and reducing

the computational complexity of the recovery algorithm. Today’s state-of-the-art CS systems

can robustly recoverK-sparse and compressible signals from justM = O (K log(N/K)) noisy

measurements using polynomial-time optimization solversor greedy algorithms.

While this represents significant progress from Nyquist-rate sampling, our contention in this

paper is that it is possible to do even better by more fully leveraging concepts from state-of-the-

art signal compression and processing algorithms. In many such algorithms, the key ingredient is

a more realisticsignal modelthat goes beyond simple sparsity by codifying the inter-dependency

structureamong the signal coefficientsα.1 For instance, JPEG2000 and other modern wavelet

image coders exploit not only the fact that most of the wavelet coefficients of a natural image

are small but also the fact that the values and locations of the large coefficients have a particular

structure. Coding the coefficients according to a model for this structure enables these algorithms

to compress images close to the maximum amount possible – significantly better than a naıve

coder that just processes each large coefficient independently.

In this paper, we introduce a model-based CS theory that parallels the conventional theory and

provides concrete guidelines on how to create model-based recovery algorithms with provable

performance guarantees. By reducing the degrees of freedomof a sparse/compressible signal

by permitting only certain configurations of the large and zero/small coefficients, signal models

provide two immediate benefits to CS. First, they enable us toreduce, in some cases significantly,

the number of measurementsM required to stably recover a signal. Second, during signal

recovery, they enable us to better differentiate true signal information from recovery artifacts,

which leads to a more robust recovery.

1Obviously, sparsity and compressibility correspond to simple signal models where each coefficient is treated independently;

for example in a sparse model, the fact that the coefficientαi is large has no bearing on the size of anyαj , j 6= i. We will

reserve the use of the term “model” for situations where we are enforcing dependencies between the values and the locations

of the coefficientsαi.

3

To precisely quantify the benefits of model-based CS, we introduce and study several new

theoretical concepts that could be of more general interest. We begin with signal models forK-

sparse signals and make precise how the structure encoded ina signal model reduces the number

of potential sparse signal supports inα. Then using themodel-based restricted isometry property

(RIP) from [5, 6], we prove that suchmodel-sparse signalscan be robustly recovered from noisy

compressive measurements. Moreover, we quantify the required number of measurementsM

and show that for some modelsM is independent ofN . These results unify and generalize

the limited related work to date on signal models for strictly sparse signals [5–9]. We then

introduce the notion of amodel-compressible signal, whose coefficientsα are no longer strictly

sparse but have a structured power-law decay. To establish that model-compressible signals can

be robustly recovered from compressive measurements, we generalize the CS RIP to a new

restricted amplification property(RAmP). For some compressible signal models, the required

number of measurementsM is independent ofN .

To take practical advantage of this new theory, we demonstrate how to integrate signal

models into two state-of-the-art CS recovery algorithms, CoSaMP [10] and iterative hard thresh-

olding (IHT) [11]. The key modification is surprisingly simple: we merely replace the nonlinear

approximation step in these greedy algorithms with a model-based approximation. Thanks to our

new theory, both new model-based recovery algorithms have provable robustness guarantees for

both model-sparse and model-compressible signals.

To validate our theory and algorithms and demonstrate its general applicability and utility, we

present two specific instances of model-based CS and conducta range of simulation experiments.

The first model accounts for the fact that the large wavelet coefficients of piecewise smooth

signals and images tend to live on a rooted, connectedtree structure[12]. Using the fact that the

number of such trees is much smaller than(

NK

), the number ofK-sparse signal supports inN

dimensions, we prove that a tree-based CoSaMP algorithm needs onlyM = O (K) measurements

to robustly recover tree-sparse and tree-compressible signals. Figure 1 indicates the potential

performance gains on a tree-compressible, piecewise smooth signal.

The second model accounts for the fact that the large coefficients of many sparse signals clus-

4

(a) test signal (b) CoSaMP (RMSE= 1.123)

(c) ℓ1-optimization (RMSE= 0.751) (d) model-based recovery (RMSE= 0.037)

Fig. 1. Example performance of model-based signal recovery. (a) Piecewise-smoothHeaviSinetest signal of length

N = 1024. This signal is compressible under a connected wavelet treemodel. Signal recovered fromM = 80 random

Gaussian measurements using (b) the iterative recovery algorithm CoSaMP, (c) standardℓ1 linear programming, and

(d) the wavelet tree-based CoSaMP algorithm from Section V.In all figures, root mean-squared error (RMSE) values

are normalized with respect to theℓ2 norm of the signal.

ter together [7, 8]. Such a so-calledblock sparsemodel is equivalent to ajoint sparsitymodel for

an ensemble ofJ , length-N signals [9], where the supports of the signals’ large coefficients are

shared across the ensemble. Using the fact that the number ofclustered supports is much smaller

than(

JNK

), we prove that a block-based CoSaMP algorithm needs onlyM = O

(K + K

Jlog(JN

K))

measurements to robustly recover block-sparse and block-compressible signals. Moreover, as the

number of signalsJ grows large, the number of measurements approachesM = O (K).

Our new theory and methods relate to a small body of previous work aimed at integrating

signal models with CS. Several groups have developed model-specific signal recovery algorithms

[5–8, 13–16]; however, their approach has either been ad hocor focused on a single model

5

class. Previous work on unions of subspaces [5, 6, 17] has focused exclusively on strictly sparse

signals and has not considered feasible recovery algorithms. To the best of our knowledge, our

general framework for model-based recovery, the concept ofa model-compressible signal, and

the associated RAmP are new to the literature.

This paper is organized as follows. A review of the CS theory in Section II lays out the

foundational concepts that we extend to the model-based case in subsequent sections. Section III

develops the concept of model-sparse signals and introduces the concept of model-compressible

signals. We also quantify how signal models improve the measurement and recovery process

by exploiting the model-based RIP for model-sparse signalsand by introducing the RAmP for

model-compressible signals. Section IV indicates how to tune CoSaMP to incorporate model

information and establishes its robustness properties formodel-sparse and model-compressible

signals. Sections V and VI then specialize our theory to the special cases of wavelet tree and

block sparse signal models and report on a series of numerical experiments that validate our

theoretical claims. We conclude with a discussion in Section VII. To make the paper more

readable, all proofs are relegated to a series of appendices.

II. BACKGROUND ON COMPRESSIVESENSING

A. Sparse and compressible signals

Given a basisψiNi=1, we can represent every signalx ∈ RN in terms ofN coefficients

αiNi=1 asx =∑N

i=1 αiψi; stacking theψi as columns into theN ×N matrix Ψ, we can write

succinctly thatx = Ψα. In the sequel, we will assume without loss of generality that the signal

x is sparse or compressible in the canonical domain so that thesparsity basisΨ is the identity

andα = x.

A signalx is K-sparseif only K ≪ N entries ofx are nonzero. We call the set of indices

corresponding to the nonzero entries thesupportof x and denote it by supp(x). The set of all

K-sparse signals is the union of the(

NK

), K-dimensional subspaces aligned with the coordinate

axes inRN . We denote this union of subspaces byΣK .

Many natural and manmade signals are not strictly sparse, but can be approximated as such;

6

we call such signalscompressible. Consider a signalx whose coefficients, when sorted in order

of decreasing magnitude, decay according to the power law

∣∣xI(i)

∣∣ ≤ S i−1/r, i = 1, . . . , N, (1)

whereI indexes the sorted coefficients. Thanks to the rapid decay oftheir coefficients, such

signals are well-approximated byK-sparse signals. LetxK ∈ ΣK represent the bestK-term

approximation ofx, which is obtained by keeping just the firstK terms in xI(i) from (1).

Denote the error of this approximation in theℓp norm as

σK(x)p := arg minx∈ΣK

‖x− x‖p = ‖x− xK‖p, (2)

where theℓp norm of the vectorx is defined as‖x‖p =(∑N

i=1 |xi|p)1/p

for 0 < p <∞. Then,

we have that

σK(x)p ≤ (rs)−1/p SK−s, (3)

with s = 1r− 1

p. That is, when measured in theℓp norm, the signal’s best approximation error

has a power-law decay with exponents asK increases. We dub such a signals-compressible.

The approximation of compressible signals by sparse signals is the basis oftransform coding

as is used in algorithms like JPEG and JPEG2000 [1]. In this framework, we acquire the full

N-sample signalx; compute the complete set of transform coefficientsα via α = Ψ−1x; locate

the K largest coefficients and discard the(N − K) smallest coefficients; and encode theK

values and locations of the largest coefficients. While a widely accepted standard, this sample-

then-compress framework suffers from three inherent inefficiencies: First, we must start with

a potentially large number of samplesN even if the ultimate desiredK is small. Second, the

encoder must compute all of theN transform coefficientsα, even though it will discard all

but K of them. Third, the encoder faces the overhead of encoding the locations of the large

coefficients.

B. Compressive measurements and the restricted isometry property (RIP)

Compressive sensing (CS) integrates the signal acquisition and compression steps into a

single process [2–4]. In CS we do not acquirex directly but rather acquireM < N linear

7

measurementsy = Φx using anM×N measurement matrixΦ. We then recoverx by exploiting

its sparsity or compressibility. Our goal is to pushM as close as possible toK in order to

perform as much signal “compression” during acquisition aspossible.

In order to recover a good estimate ofx (the K largestxi’s, for example) from theM

compressive measurements, the measurement matrixΦ should satisfy therestricted isometry

property (RIP) [3].

Definition 1: An M × N matrix Φ has theK-restricted isometry property(K-RIP) with

constantδK if, for all x ∈ ΣK ,

(1− δK)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + δK)‖x‖22. (4)

In words, theK-RIP ensures that all submatrices ofΦ of size M × K are close to an

isometry, and therefore distance (and information) preserving. Practical recovery algorithms

typically require thatΦ have a slightly stronger2K-RIP,3K-RIP, or higher-order RIP in order to

preserve distances betweenK-sparse vectors (which are2K-sparse in general), three-way sums

of K-sparse vectors (which are3K-sparse in general), and other higher-order structures.

While the design of a measurement matrixΦ satisfying theK-RIP is an NP-Complete

problem in general [3], random matrices whose entries are i.i.d. Gaussian, Bernoulli (±1), or

more generally subgaussian2 work with high probability providedM = O (K log(N/K)). These

random matrices also have a so-calleduniversalityproperty in that, for any choice of orthonormal

basis matrixΨ, ΦΨ has theK-RIP with high probability. This is useful when the signal issparse

not in the canonical domain but in basisΨ. A randomΦ corresponds to an intriguing data

acquisition protocol in which each measurementyj is a randomly weighted linear combination

of the entries ofx.

2A random variableX is called subgaussian if there existsc > 0 such thatE`

eXt´

≤ ec2t2/2 for all t ∈ R. Examples include

the Gaussian and Bernoulli random variables, as well as any bounded random variable. [18]

8

C. Recovery algorithms

Since there are infinitely many signal coefficient vectorsx′ that produce the same set of

compressive measurementsy = Φx, to recover the “right” signal we exploit our a priori

knowledge of its sparsity or compressibility. For example,we could seek the sparsestx that

agrees with the measurementsy:

x = arg miny=Φx′

‖x′‖0, (5)

where theℓ0 “norm” of a vector counts its number of nonzero entries. While this optimization

can recover aK-sparse signal from justM = 2K compressive measurements, it is unfortunately

a combinatorial, NP-Complete problem; furthermore, the recovery is not stable in the presence

of noise.

Practical, stable recovery algorithms rely on the RIP (and therefore require at leastM =

O (K log(N/K)) measurements); they can be grouped into two camps. The first approach

convexifies theℓ0 optimization (5) to theℓ1 optimization

x = arg miny=Φx′

‖x′‖1. (6)

This corresponds to a linear program that can be solved in polynomial time [2, 3]. Adaptations to

deal with additive noise iny or x include basis pursuit with denoising (BPDN) [19], complexity-

based regularization [20], and the Dantzig Selector [21].

The second approach finds the sparsestx agreeing with the measurementsy through an

iterative, greedy search. Algorithms such as matching pursuit, orthogonal matching pursuit

[22], StOMP [23], iterative hard thresholding (IHT) [11], CoSaMP [10], and Subspace Pursuit

(SP) [24] all revolve around a bestL-term approximation for the estimated signal, withL varying

for each algorithm.

D. Performance bounds on signal recovery

GivenM = O (K log(N/K)) compressive measurements, a number of different CS signal

recovery algorithms, including all of theℓ1 techniques mentioned above and the CoSaMP, SP,

9

and IHT iterative techniques, offer provably stable signalrecovery with performance close to

optimalK-term approximation (recall (3)). For a randomΦ, all results hold with high probability.

For a noise-free,K-sparse signal, these algorithms offer perfect recovery, meaning that the

signal x recovered from the compressive measurementsy = Φx is exactlyx = x.

For aK-sparse signalx whose measurements are corrupted by noisen of bounded norm

— that is, we measurey = Φx+ n — the mean-squared error of the recovered signalx is

‖x− x‖2 ≤ C‖n‖2, (7)

with C a small constant [2, 3, 10, 11].

For ans-compressible signalx whose measurements are corrupted by noisen of bounded

norm, the mean-squared error of the recovered signalx is

‖x− x‖2 ≤ C1‖x− xK‖2 + C21√K‖x− xK‖1 + C3‖n‖2. (8)

Using (3) we can simplify this expression to

‖x− x‖2 ≤C1SK

−s

√2s

+C2SK

−s

s− 1/2+ C3‖n‖2. (9)

III. B EYOND SPARSE AND COMPRESSIBLESIGNALS

While many natural and manmade signals and images can be described to first-order as sparse

or compressible, the support of their large coefficients often has an underlying inter-dependency

structure. This phenomenon has received only limited attention by the CS community to date [5–

8, 14–16]. In this section, we introduce a model-based theory of CS that captures such structure.

A model reduces the degrees of freedom of a sparse/compressible signal by permitting only

certain configurations of supports for the large coefficient. As we will show, this allows us to

reduce, in some cases significantly, the number of compressive measurementsM required to

stably recover a signal.

10

A. Model-sparse signals

Recall from Section II-A that aK-sparse signal vectorx lives in ΣK ⊂ RN , which is a union

of(

NK

)subspaces of dimensionK. Other than itsK-sparsity, there are no further constraints

on the support or values of its coefficients. Asignal modelendows theK-sparse signalx with

additional structure that allows certainK-dimensional subspaces inΣK and disallows others [5,

6].

To state a formal definition of a signal model, letx|Ω represent the entries ofx corresponding

to the set of indicesΩ ⊆ 1, . . . , N, and letΩC denote the complement of the setΩ.

Definition 2: A signal modelMK is defined as the union ofmK canonicalK-dimensional

subspaces

MK =

mK⋃

m=1

Xm, such thatXm := x : x|Ωm ∈ RK , x|ΩC

m= 0,

where each subspaceXm contains all signalsx with supp(x) ∈ Ωm. Thus, the modelMK is

defined by the set of possible supportsΩ1, . . . ,ΩmK.

Signals fromMK are calledK-model sparse. Clearly,MK ⊆ ΣK and containsmK ≤(

NK

)

subspaces.

In Sections V and VI below we consider two concrete models forsparse signals. The first

model accounts for the fact that the large wavelet coefficients of piecewise smooth signals and

images tend to live on a rooted, connectedtree structure[12]. The second model accounts for

the fact that the large coefficients of sparse signals oftencluster together [7–9].

B. Model-based RIP

If we know that the signalx being acquired isK-model sparse, then we can relax the

RIP constraint on the CS measurement matrixΦ and still achieve stable recovery from the

compressive measurementsy = Φx [5, 6].

Definition 3: [5, 6] An M × N matrix Φ has theMK-restricted isometry property(MK-

RIP) with constantδMKif, for all x ∈MK , we have

(1− δMK)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + δMK

)‖x‖22. (10)

11

To obtain a performance guarantee for model-based recoveryof K-model sparse signals in

additive measurement noise, we must define an enlarged unionof subspaces that includes sums

of elements in the model.

Definition 4: TheB-Minkowski sumfor the setMK , with B > 1 an integer, is defined as

MBK =

x =

B∑

r=1

x(r), with x(r) ∈MK

.

DefineMB(x,K) as the algorithm that obtains the best approximation ofx in the enlarged

union of subspacesMBK :

MB(x,K) = arg minx∈MB

K

‖x− x‖2.

We write M(x,K) := M1(x,K) when B = 1. Note that for many models, we will have

MBK ⊂MBK , and so the algorithmM(x,BK) will provide a strictly better approximation than

MB(x,K).

Our performance guarantee for model-sparse signal recovery will require that the measure-

ment matrixΦ be a near-isometry for all subspaces inMBK for someB > 1. This requirement

is a direct generalization of the2K-RIP, 3K-RIP, and higher-order RIPs from the conventional

CS theory.

Blumensath and Davies [5] have quantified the number of measurementsM necessary for

a random CS matrix to have theMK-RIP with a given probability.

Theorem 1: [5] LetMK be the union ofmK subspaces ofK-dimensions inRN . Then, for

any t > 0 and any

M ≥ 2

cδ2MK

(ln(2mK) +K ln

12

δMK

+ t

),

anM×N i.i.d. subgaussian random matrix has theMK-RIP with constantδMKwith probability

at least1− e−t.

This bound can be used to recover the conventional CS result by substitutingmK =(

NK

)≈

(Ne/K)K . TheMK-RIP property is sufficient for robust recovery of model-sparse signals, as

we show below in Section IV-B.

12

C. Model-compressible signals

Just as compressible signals are “nearlyK-sparse” and thus live close to the union of

subspacesΣK in RN , model-compressible signals are “nearlyK-model sparse” and live close

to the restricted union of subspacesMK . In this section, we make this new concept rigorous.

Recall from (3) that we defined compressible signals in termsof the decay of theirK-term

approximation error.

The ℓ2 error incurred by approximatingx ∈ RN by the best model-based approximation in

MK is given by

σMK(x) := inf

x∈MK

‖x− x‖2 = ‖x−M(x,K)‖2.

The decay of this approximation error defines the model-compressibility of a signal.

Definition 5: The set ofs-model-compressible signalsis defined as

Ms =x ∈ R

N : σMK(x) ≤ SK−1/s, 1 ≤ K ≤ N, S <∞

.

Define |x|Ms as the smallest value ofS for which this condition holds forx ands.

We say thatx ∈Ms is ans-model-compressible signalunder the signal modelMK . These

approximation classes have been characterized for certainsignal models; see Section V for an

example.

D. Nested model approximations and residual subspaces

In conventional CS, the same requirement (RIP) is a sufficient condition for the stable

recovery of both sparse and compressible signals. In model-based recovery, however, the class

of compressible signals is much larger than that of sparse signals, since the set of subspaces

containing model-sparse signals does not span allK-dimensional subspaces. Therefore, we need

to introduce some additional tools to develop a sufficient condition for the stable recovery of

model-compressible signals.

We will pay particular attention to modelsMK that generatenested approximations, since

they are more amenable to analysis.

13

Definition 6: A modelM = M1,M2, . . . has thenested approximation property(NAP)

if supp(M(x,K)) ⊂ supp(M(x,K ′)) for all K < K ′ and for allx ∈ RN .

In words, a model generates nested approximations if the support of the bestK ′-term model-

based approximation contains the support of the bestK-term model-based approximation for all

K < K ′. An important example of a NAP model is the standard compressible signal model of

(3).

When a model obeys the NAP, the support of the difference between the bestjK-term

model-based approximation and the best(j + 1)K-term model-based approximation of a signal

can be shown to lie in a small union of subspaces, thanks to thestructure enforced by the

model. This structure is captured by the set of subspaces that are included in each subsequent

approximation, as defined below.

Definition 7: The jth set of residual subspacesof sizeK is defined as

Rj,K(M) =u ∈ R

N such that u = M(x, jK)−M(x, (j − 1)K) for somex ∈ RN,

for j = 1, . . . , ⌈N/K⌉.

Under the NAP, each signalx in a model can be partitioned into its bestK-term

approximationxT1 , the additional components present in the best2K-term approximationxT2 ,

and so on, withx =∑⌈N/K⌉

j=1 xTjandxTj

∈ Rj,K(M) for eachj. Each signal partitionxTjis a

K-sparse signal, and thusRj,K(M) is a union of subspaces of dimensionK. We will denote

by Rj the number of subspaces that composeRj,K(M) and omit the dependence onM in the

sequel for brevity.

Intuitively, the norms of the partitions‖xTj‖2 decay asj increase for signals that are

compressible under the model. As the next subsection shows,this observation is instrumental in

relaxing the isometry restrictions on the measurement matrix Φ and bounding the recovery error

for s-model-compressible signals when the model obeys the NAP.

14

E. The restricted amplification property (RAmP)

For exactlyK-model-sparse signals, we discussed in Section III-B that the number of

compressive measurementsM required for a random matrix to have theMK-RIP is determined

by the number of canonical subspacesmK via (11). Unfortunately, such model-sparse concepts

and results do not immediately extend to model-compressible signals. Thus, we develop a

generalization of theMK-RIP that we will use to quantify the stability of recovery for model-

compressible signals.

One way to analyze the robustness of compressible signal recovery in conventional CS is

to consider the tail of the signal outside itsK-term approximation as contributing additional

“noise” to the measurements of size‖Φ(x− xK)‖2 [10, 11, 25]. Consequently, the conventional

K-sparse recovery performance result can be applied with theaugmented noisen+ Φ(x−xK).

This technique can also be used to quantify the robustness ofmodel-compressible signal

recovery. The key quantity we must control is the amplification of the model-based approximation

residual throughΦ. The following property is a new generalization of the RIP and model-based

RIP.

Definition 8: A matrix Φ has the(ǫK , r)-restricted amplification property(RAmP) for the

residual subspacesRj,K of modelM if

‖Φu‖22 ≤ (1 + ǫK)j2r‖u‖22 (11)

for any u ∈ Rj,K for each1 ≤ j ≤ ⌈N/K⌉.

The regularity parameterr > 0 caps the growth rate of the amplification ofu ∈ Rj,K as a

function of j. Its value can be chosen so that the growth in amplification with j balances the

decay of the norm in each residual subspaceRj,K with j.

We can quantify the number of compressive measurementsM required for a random

measurement matrixΦ to have the RAmP with high probability; we prove the following in

Appendix I.

Theorem 2:Let Φ be anM × N matrix with i.i.d. subgaussian entries and let the set of

residual subspacesRj,K of modelM containRj subspaces of dimensionK for each1 ≤ j ≤

15

⌈N/K⌉. If

M ≥ max1≤j≤⌈N/K⌉

1(jr√

1 + ǫK − 1)2(

2K + 4 lnRjN

K+ 2t

), (12)

then the matrixΦ has the(ǫK , r)-RAmP with probability1− e−t.

The order of the bound of Theorem 2 is lower thanO (K log(N/K)) as long as the number

of subspacesRj grows slower thanNK .

Armed with the RaMP, we can state the following result, whichwill provide robustness for

the recovery of model-compressible signals; see Appendix II for the proof.

Theorem 3:Let x ∈ Ms be ans-model compressible signal under a modelM that obeys

the NAP. If Φ has the(ǫK , r)-RAmP andr = s− 1, then we have

‖Φ(x−M(x,K))‖2 ≤√

1 + ǫKK−s ln

⌈N

K

⌉|x|Ms .

IV. M ODEL-BASED SIGNAL RECOVERY ALGORITHMS

To take practical advantage of our new theory for model-based CS, we demonstrate how to

integrate signal models into two state-of-the-art CS recovery algorithms, CoSaMP [10] (in this

section) and iterative hard thresholding (IHT) [11] (in Appendix III). The key modification is

simple: we merely replace the bestK-term approximation step in these greedy algorithms with a

bestK-term model-based approximation. Since at each iteration we need only search over themK

subspaces ofMK rather than(

NK

)subspaces ofΣK , fewer measurements will be required for the

same degree of robust signal recovery. Or, alternatively, using the same number of measurements,

more accurate recovery can be achieved. After presenting the modified CoSaMP algorithm, we

prove robustness guarantees for both model-sparse and model-compressible signals.

A. Model-based CoSaMP

We choose to modify the CoSaMP algorithm [10] for two reasons. First, it has robust

recovery guarantees that are on par with the best convex optimization-based approaches. Second,

16

Algorithm 1 Model-based CoSaMPInputs: CS matrixΦ, measurementsy, modelMK

Output:K-sparse approximationx to true signalx

x0 = 0, r = y, i = 0 initializewhile halting criterion falsedo

i← i+ 1

e← ΦT r form signal residual estimateΩ← supp(M2(e,K)) prune signal residual estimate according to signal modelT ← Ω ∪ supp(xi−1) merge supportsb|T ← Φ†

T y, b|T C ← 0 form signal estimatexi ←M(b,K) prune signal estimate according to signal modelr ← y − Φxi update measurement residual

end while

return x← xi

it has a simple iterative, greedy structure based on a bestBK-term approximation (withB a

small integer) that is easily modified to incorporate a bestBK-term model-based approximation

MB(K, x). Pseudocode for the modified algorithm is given in Algorithm1.

We now study the performance of model-based CoSaMP signal recovery on model-sparse

signals and model-compressible signals.

B. Performance of model-sparse signal recovery

A robustness guarantee for noisy measurements of model-sparse signals can be obtained

using the model-based RIP (10). The following theorem is proven in Appendix IV.

Theorem 4:Let x ∈MK and lety = Φx+ n be a set of noisy CS measurements. IfΦ has

anM4K-RIP constant ofδM4

K≤ 0.1, then the signal estimatexi obtained from iterationi of the

model-based CoSaMP algorithm satisfies

‖x− xi‖2 ≤ 2−i‖x‖2 + 15‖n‖2. (13)

17

C. Performance of model-compressible signal recovery

Using the new tools introduced in Section III, we can providea robustness guarantee for

noisy measurements of model-compressible signals, using the RAmP as a condition on the

measurement matrixΦ.

Theorem 5:Let x ∈ Ms be ans-model-compressible signal from a modelM that obeys

the NAP, and lety = Φx + n be a set of noisy CS measurements. IfΦ has theM4K-RIP with

δM4K≤ 0.1 and the(ǫK , r)-RAmP with ǫK ≤ 0.1 and r = s − 1, then the signal estimatexi

obtained from iterationi of the model-based CoSaMP algorithm satisfies

‖x− xi‖2 ≤ 2−i‖x‖2 + 35(‖n‖2 + |x|MsK

−s(1 + ln⌈N/K⌉)). (14)

To prove the theorem, we first bound the recovery error for ans-model-compressible signal

x ∈ Ms when the matrixΦ has the(ǫK , r)-RAmP with r ≤ s − 1. Then, using Theorems 3

and 4, we can easily prove the result by following the analogous proof in [10].

D. Robustness to model mismatch

We now analyze the robustness of model-based CS recovery tomodel mismatch, which occurs

when the signal being recovered from compressive measurements does not conform exactly to

the model used in the recovery algorithm.

We begin with optimistic results for signals that are “close” to matching the recovery model.

First consider a signalx that is notK-model sparse as the recovery algorithm assumes but rather

(K + κ)-model sparse for some small integerκ. This signal can be decomposed intoxK , the

signal’sK-term model-based approximation, andx − xK , the error of this approximation. For

κ ≤ K, we have thatx−xK ∈ R2,K . If the matrixΦ has the(ǫK , r)-RAmP, then it follows than

‖Φ(x− xK)‖2 ≤ 2r√

1 + ǫK‖x− xK‖2. (15)

18

Using equations (13) and (15), we obtain the following guarantee for theith iteration of model-

based CoSaMP:

‖x− xi‖2 ≤ 2−i‖x‖2 + 16 · 2r√

1 + ǫK‖x− xK‖2 + 15‖n‖2.

By noting that‖x− xK‖2 is small, we obtain a guarantee that is close to (13).

Second, consider a signalx that is nots-model compressible as the recovery algorithm

assumes but rather(s− ǫ)-model compressible. The following bound can be obtained under the

conditions of Theorem 5 by modifying the argument in Appendix II:

‖x− xi‖2 ≤ 2−i‖x‖2 + 35

(‖n‖2 + |x|MsK

−s

(1 +⌈N/K⌉ǫ − 1

ǫ

)).

As ǫ becomes smaller, the factor⌈N/K⌉ǫ−1ǫ

approacheslog⌈N/K⌉, matching (14). In summary,

as long as the deviations from the model-sparse and model-compressible models are small, our

model-based recovery guarantees still apply within a smallbounded constant factor.

We end with a more pessimistic, worst-case result for signals that are arbitrarily far away

from model-sparse or model-compressible. Consider such anarbitraryx ∈ RN and compute its

nested model-based approximationsxjK = M(x, jK), j = 1, . . . , ⌈N/K⌉. If x is not model-

compressible, then the model-based approximation errorσjK(x) is not guaranteed to decay asj

decreases. Additionally, the number of residual subspacesRj,K could be as large as(

NK

); that is,

the jth difference between subsequent model-based approximations xTj= xjK − x(j−1)K might

lie in any arbitraryK-dimensional subspace. This worst case is equivalent to setting r = 0 and

Rj =(

NK

)in Theorem 2. It is easy to see that this condition on the number of measurements

M is nothing but the standard RIP for CS. Hence, if inflate the number of measurements to

M = O (K log(N/K)) (the usual number for conventional CS), the performance of model-based

CoSaMP recovery on an arbitrary signalx follows theK-term model-basedapproximation ofx

within a bounded constant factor.

E. Computational complexity of model-based recovery

The computational complexity of a model-based signal recovery algorithm differs from

that of a standard algorithm by two factors. The first factor is the reduction in the number

19

of measurementsM necessary for recovery: since most current recovery algorithms have a

computational complexity that is linear in the number of measurements, any reduction inM

reduces the total complexity. The second factor is the cost of the model-based approximation.

TheK-term approximation used in most current recovery algorithms can be implemented with a

simple sorting operation (O (N logN) complexity, in general). Ideally, the signal model should

support a similarly efficient approximation algorithm.

To validate our theory and algorithms and demonstrate theirgeneral applicability and utility,

we now present two specific instances of model-based CS and conduct a range of simulation

experiments.

V. EXAMPLE : WAVELET TREE MODEL

Wavelet decompositions have found wide application in the analysis, processing, and

compression of smooth and piecewise smooth signals becausethese signals areK-sparse and

compressible, respectively [1]. Moreover, the wavelet coefficients can be naturally organized

into a tree structure, and for many kinds of natural and manmade signals the largest coefficients

cluster along the branches of this tree. This motivates a connected tree model for the wavelet

coefficients [26–28].

While CS recovery for wavelet-sparse signals has been considered previously [14–16],

the resulting algorithms integrated the tree constraint inan ad-hoc fashion. Furthermore, the

algorithms provide no recovery guarantees or bounds on the necessary number of compressive

measurements.

A. Tree-sparse signals

We first describe tree sparsity in the context of sparse wavelet decompositions. We focus

on one-dimensional signals and binary wavelet trees, but all of our results extend directly to

d-dimensional signals and2d-ary wavelet trees.

Consider a signalx of lengthN = 2I , for an integer value ofI. The wavelet representation

20

...Fig. 2. Binary wavelet tree for a one-dimensional signal. The squares denote the large wavelet coefficients that arise

from the discontinuities in the piecewise smooth signal drawn below; the support of the large coefficients forms a

rooted, connected tree.

of x is given by

x = v0ν +

I−1∑

i=0

2i−1∑

j=0

wi,jψi,j ,

where ν is the scaling function andψi,j is the wavelet function at scalei and offsetj. The

wavelet transform consists of the scaling coefficientv0 and wavelet coefficientswi,j at scalei,

0 ≤ i ≤ I − 1, and positionj, 0 ≤ j ≤ 2i − 1. In terms of our earlier matrix notation,x has

the representationx = Ψα, whereΨ is a matrix containing the scaling and wavelet functions as

columns, andα = [v0 w0,0 w1,0 w1,1 w2,0 . . .]T is the vector of scaling and wavelet coefficients.

We are, of course, interested in sparse and compressibleα.

The nested supports of the wavelets at different scales create a parent/child relationship

between wavelet coefficients at different scales. We say that wi−1,⌊j/2⌋ is the parent of wi,j

and thatwi+1,2j and wi+1,2j+1 are thechildren of wi,j. These relationships can be expressed

graphically by the wavelet coefficient tree in Figure 2.

Wavelet functions act as local discontinuity detectors, and using the nested support property

of wavelets at different scales, it is straightforward to see that a signal discontinuity will give

rise to a chain of large wavelet coefficients along a branch ofthe wavelet tree from a leaf to

the root. Moreover, smooth signal regions will give rise to regions of small wavelet coefficients.

This “connected tree” property has been well-exploited in anumber of wavelet-based processing

21

[12, 29, 30] and compression [31, 32] algorithms. In this section, we will specialize the theory

developed in Sections III and IV to a connected tree modelT .

A set of wavelet coefficientsΩ forms aconnected subtreeif, whenever a coefficientwi,j ∈ Ω,

then its parentwi−1,⌊j/2⌋ ∈ Ω as well. Each such setΩ defines a subspace of signals whose support

is contained inΩ; that is, all wavelet coefficients outsideΩ are zero. In this way, we define the

modelTK as the union of allK-dimensional subspaces corresponding to supportsΩ that form

connected subtrees.

Definition 9: Define the set ofK-tree sparse signalsas

TK =

x = v0ν +

I−1∑

i=0

2i∑

j=1

wi,jψi,j : w|ΩC = 0, |Ω| = K,Ω forms a connected subtree

.

To quantify the number of subspaces inTK , it suffices to count the number of distinct

connected subtrees of sizeK in a binary tree of sizeN . We prove the following result in

Appendix V.

Proposition 1: The number of subspaces inTK obeysTK ≤ 4K+4

Ke2 for K ≥ log2N and

TK ≤ (2e)K

K+1for K < log2N .

B. Tree-based approximation

To implement tree-based signal recovery, we seek an efficient algorithm T(x,K) to solve

the optimal approximation

xTK = arg minx∈TK

‖x− x‖2. (16)

Fortuitously, an efficient solver exists, called thecondensing sort and select algorithm(CSSA)

[26–28]. Recall that subtree approximation coincides withstandardK-term approximation (and

hence can be solved by simply sorting the wavelet coefficients) when the wavelet coefficients

are monotonically nonincreasing along the tree branches out from the root. The CSSA solves

(16) in the case of general wavelet coefficient values bycondensingthe nonmonotonic segments

of the tree branches using an iterative sort-and-average routine. The condensed nodes are called

“supernodes”. Condensing a large coefficient far down the tree accounts for the potentially large

cost (in terms of the total budget of tree nodesK) of growing the tree to that point.

22

The CSSA can also be interpreted as a greedy search among the nodes. For each node in the

tree, the algorithm calculates the average wavelet coefficient magnitude for each subtree rooted

at that node, and records the largest average among all the subtrees as the energy for that node.

The CSSA then searches for the unselected node with the largest energy and adds the subtree

corresponding to the node’s energy to the estimated supportas a supernode [28].

Since the first step of the CSSA involves sorting all of the wavelet coefficients, overall it

requiresO (N logN) computations. However, once the CSSA grows the optimal treeof sizeK,

it is trivial to determine the optimal trees of size< K and computationally efficient to grow the

optimal trees of size> K [26].

The constrained optimization (16) can be rewritten as an unconstrained problem by

introducing the Lagrange multiplierλ [33]:

minx∈T‖x− x‖22 + λ(‖α‖0 −K),

where T = ∪Nn=1Tn and α are the wavelet coefficients ofx. Except for the inconsequential

λK term, this optimization coincides with Donoho’scomplexity penalized sum of squares[33],

which can be solved in onlyO (N) computations using coarse-to-fine dynamic programming on

the tree. Its primary shortcoming is the nonobvious relationship between the tuning parameter

λ and and the resulting sizeK of the optimal connected subtree.

C. Tree-compressible signals

Specializing Definition 2 from Section III-C toT , we make the following definition.

Definition 10: Define the set ofs-tree compressible signalsas

Ts =x ∈ R

N : ‖x− T(x,K)‖2 ≤ SK−s, 1 ≤ K ≤ N, S <∞.

Furthermore, define|x|Ts as the smallest value ofS for which this condition holds forx ands.

Tree approximation classes contain signals whose wavelet coefficients have a loose (and

possibly interrupted) decay from coarse to fine scales. These classes have been well-characterized

for wavelet-sparse signals [27, 28, 32] and are intrinsically linked with the Besov spaces

23

Bsq(Lp([0, 1])). Besov spaces contain functions of one or more continuous variables that have

(roughly speaking)s derivatives inLp([0, 1]); the parameterq provides finer distinctions of

smoothness. When a Besov space signalxa with s > 1/p − 1/2 is sampled uniformly and

converted to a length-N vectorx, its wavelet coefficients belong to the tree approximation space

Ts, with

|xN |Ts ≍ ‖xa‖Lp([0,1]) + ‖xa‖Bsq (Lp([0,1])),

where “≍” denotes an equivalent norm. The same result holds ifs = 1/p− 1/2 andq ≤ p.

D. Stable tree-based recovery from compressive measurements

For tree-sparse signals, by applying Theorem 1 and Proposition 1, we find that a subgaussian

random matrix has theTK-RIP property with constantδTKand probability1−e−t if the number

of measurements obeys

M ≥

2cδ2

TK

(K ln 48

δTK+ ln 512

Ke2 + t)

if K < log2N,

2cδ2

TK

(K ln 24e

δTK

+ ln 2K+1

+ t)

if K ≥ log2N,

Thus, the number of measurements necessary for stable recovery of tree-sparse signals is linear

in K, without the dependence onN present in conventional non-model-based CS recovery.

For tree-compressible signals, we must quantify the numberof subspacesRj in each residual

set Rj,K for the approximation class. We can then apply the theory of Section IV-C with

Proposition 1 to calculate smallest allowableM via Theorem 5.

Proposition 2: The number ofK-dimensional subspaces that compriseRj,K obeys

Rj ≤

(2e)K(2j+1)

(Kj+K+1)(Kj+1)if 1 ≤ j <

⌊log2 N

K

⌋,

2(3j+2)K+8ejK

(Kj+1)K(j+1)e2 if j =⌊

log2 NK

⌋,

4(2j+1)K+8

K2j(j+1)e4 if j >⌊

log2 NK

⌋.

(17)

Using Proposition 2 and Theorem 5, we obtain the following condition for the matrixΦ to

have the RAmP, which is proved in Appendix VI.

24

Proposition 3: Let Φ be anM ×N matrix with i.i.d. subgaussian entries.. If

M ≥

2

(√

1+ǫK−1)2

(10K + 2 ln N

K(K+1)(2K+1)+ t)

if K ≤ log2N,

2

(√

1+ǫK−1)2

(10K + 2 ln 601N

K3 + t)

if K > log2N,

then the matrixΦ has the(ǫK , s)-RAmP for modelT and alls > 0.5 with probability1− e−t.

Both cases give a simplified bound on the number of measurements required asM = O (K),

which is a substantial improvement over theM = O (K log(N/K)) required by conventional

CS recovery methods. Thus, whenΦ satisfies Proposition 3, we have the guarantee (14) for

sampled Besov space signals fromBsq(Lp([0, 1])).

E. Experiments

We now present the results of a number of numerical experiments that illustrate the

effectiveness of a tree-based recovery algorithm. Our consistent observation is that explicit

incorporation of the model in the recovery process significantly improves the quality of recovery

for a given number of measurements. In addition, model-based recovery remains stable when the

inputs are no longer tree-sparse, but rather are tree-compressible and/or corrupted with differing

levels of noise. We employ the model-based CoSaMP recovery of Algorithm 1 with a CSSA-

based approximation step in all experiments.

We first study one-dimensional signals that match the connected wavelet-tree model described

above. Among such signals is the class of piecewise smooth functions, which are commonly

encountered in analysis and practice.

Figure 1 illustrates the results of recovering the tree-compressibleHeaviSinesignal of length

N = 1024 from M = 80 noise-free random Gaussian measurements using CoSaMP,ℓ1-norm

minimization using thel1 eq solver from theℓ1-Magic toolbox,3 and our tree-based recovery

algorithm. It is clear that the number of measurements (M = 80) is far fewer than the minimum

number required by CoSaMP andℓ1-norm minimization to accurately recover the signal. In

contrast, tree-based recovery usingK = 26 is accurate and uses fewer iterations to converge

3http://www.acm.caltech.edu/l1magic.

25

2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

M/KA

vera

ge n

orm

aliz

ed e

rror

mag

nitu

de Model−based recoveryCoSaMP

Fig. 3. Performance of CoSaMP vs. wavelet tree-based recovery on a class of piecewise-cubic signals as a function

of M/K.

than conventional CoSaMP. Moreover, the normalized magnitude of the squared error for tree-

based recovery is equal to 0.037, which is remarkably close to the error between the noise-free

signal and itsbestK-term tree-approximation (0.036).

Figure 3 illustrates the results of a Monte Carlo simulationstudy on the impact of the number

of measurementsM on the performance of model-based and conventional recovery for a class

of tree-sparse piecewise-polynomial signals. Each data point was obtained by measuring the

normalized recovery error of 500 sample trials. Each sampletrial was conducted by generating

a new piecewise-polynomial signal with five polynomial pieces of cubic degree and randomly

placed discontinuities, computing its bestK-term tree-approximation using the CSSA, and then

measuring the resulting signal using a matrix with i.i.d. Gaussian entries. Model-based recovery

attains near-perfect recovery atM = 3K measurements, while CoSaMP only matches this

performance atM = 5K. We defer a full Monte Carlo comparison of our method with the

much more computationally demandingℓ1-norm minimization to future work. In practice, we

have noticed that CoSaMP andℓ1-norm minimization offer similar recovery trends; consequently,

we can expect that model-based recovery will offer a similardegree of improvement overℓ1-norm

minimization.

Further, we demonstrate that model-based recovery performs stably in the presence of

26

0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1CoSaMP (M = 5K)

))

Max

imum

nor

mal

ized

rec

over

y er

ror

Fig. 4. Robustness to measurement noise for standard and wavelet tree-based CS recovery algorithms. We plot the

maximum normalized recovery error over 200 sample trials asa function of the expected signal-to-noise ratio. The

linear growth demonstrates that model-based recovery possesses the same robustness to noise as CoSaMP andℓ1-norm

minimization.

measurement noise. We generated sample piecewise-polynomial signals as above, computed

their bestK-term tree-approximations, computedM measurements of each approximation, and

finally added Gaussian noise of expected norm‖n‖2 to each measurement. Then, we recovered

the signal using CoSaMP and model-based recovery and measured the recovery error in each case.

For comparison purposes, we also tested the recovery performance of aℓ1-norm minimization

algorithm that accounts for the presence of noise, which hasbeen implemented as thel1 qc

solver in theℓ1-Magic toolbox. First, we determined the lowest value ofM for which the

respective algorithms provided near-perfect recovery in the absence of noise in the measurements.

This corresponds toM = 3.5K for model-based recovery,M = 5K for CoSaMP, andM = 4.5K

for ℓ1 minimization. Next, we generated 200 sample tree-modeled signals, computedM noisy

measurements, recovered the signal using the given algorithm and recorded the recovery error.

Figure 4 illustrates the growth in maximum normalized recovery error (over the 200 sample

trials) as a function of the expected measurement signal-to-noise ratio for the tree algorithms. We

observe similar stability curves for all three algorithms,while noting that model-based recovery

offers this kind of stability using significantly fewer measurements.

27

(a) Peppers (b) CoSaMP (c) model-based recovery

(RMSE = 22.8) (RMSE = 11.1)

Fig. 5. Example performance of standard and model-based recovery on images. (a)N = 128× 128 = 16384-pixel

Pepperstest image. Image recovery fromM = 5000 compressive measurements using (b) conventional CoSaMP and

(c) our wavelet tree-based algorithm.

Finally, we turn to two-dimensional images and a wavelet quadtree model. The connected

wavelet-tree model has proven useful for compressing natural images [27]; thus, our algorithm

provides a simple and provably efficient method for recovering a wide variety of natural images

from compressive measurements. An example of recovery performance is given in Figure 5. The

test image (Peppers) is of sizeN = 128 × 128 = 16384 pixels, and we computedM = 5000

random Gaussian measurements. Model-based recovery againoffers higher performance than

standard signal recovery algorithms like CoSaMP, both in terms of recovery mean-squared error

and visual quality.

VI. EXAMPLE : BLOCK-SPARSE SIGNALS AND SIGNAL ENSEMBLES

In a block-sparsesignal, the locations of the significant coefficients cluster in blocks under

a specific sorting order. Block-sparse signals have been previously studied in CS applications,

including DNA microarrays and magnetoencephalography [7,8]. An equivalent problem arises

in CS for signal ensembles, such as sensor networks and MIMO communication [8, 9, 34]. In this

case, several signals share a common coefficient support set. For example, when a frequency-

sparse acoustic signal is recorded by an array of microphones, then all of the recorded signals

28

contain the same Fourier frequencies but with different amplitudes and delays. Such a signal

ensemble can be re-shaped as a single vector by concatenation, and then the coefficients can be

rearranged so that the concatenated vector exhibits block sparsity.

It has been shown that the block-sparse structure enables signal recovery from a reduced

number of CS measurements, both for the single signal case [7, 8] and the signal ensemble

case [9], through the use of specially tailored recovery algorithm [7, 8, 35]. However, the

robustness guarantees for such algorithms either are restricted to exactly sparse signals and

noiseless measurements, do not have explicit bounds on the number of necessary measurements,

or are asymptotic in nature.

In this section, we formulate the block sparsity signal model as a union of subspaces and

pose an approximation algorithm on this union of subspaces.The approximation algorithm is

used to implement block-based signal recovery. We also define the corresponding class of block-

compressible signals and quantify the number of measurements necessary for robust recovery.

A. Block-sparse signals

Consider a classS of signal vectorsx ∈ RJN , with J andN integers. This signal can be

reshapped into aJ × N matrix X, and we use both notations interchangeably in the sequel.

We will restrict entire columns ofX to be part of the support of the signal as a group. That

is, signalsX in a block-sparse model have entire columns as zeros or nonzeros. The measure

of sparsity forX is its number of nonzero columns. More formally, we make the following

definition.

Definition 11: [7, 8] Define the set ofK-block sparse signalsas

SK = X = [x1 . . . xN ] ∈ RJ×N such thatxn = 0 for n /∈ Ω,Ω ⊆ 1, . . . , N, |Ω| = K.

It is important to note that aK-block sparse signal has sparsityKJ , which is dependent

on the size of the blockJ . We can extend this formulation to ensembles ofJ , length-N signals

with common support. Denote this signal ensemble byx1, . . . , xJ, with xj ∈ RN , 1 ≤ j ≤ J .

We formulate a matrix representationX of the ensemble that features the signalxj in its jth

row: X = [x1 . . . xN ]T . The matrixX features the same structure as the matrixX obtained

29

from a block-sparse signal; thus, the matrixX can be converted into a block-sparse vectorx

that represents the signal ensemble.

B. Block-based approximation

To pose a the block-based approximation algorithm, we need to define the mixed norm of

a matrix.

Definition 12: The (p, q) mixed normof the matrixX = [x1 x2 . . . xN ] is defined as

‖X‖(p,q) =

(N∑

n=1

‖xn‖qp

)1/q

.

Whenq = 0, ‖X‖(p,0) simply counts the number of nonzero columns inX.

We immediately find that‖X‖(p,p) = ‖x‖p, with x the vectorization ofX. Intuitively, we

pose the algorithmS(X,K) to obtain the best block-based approximation of the signalX as

follows:

XSK = arg min

X∈RJ×N‖X − X‖(2,2) subject to‖X‖(2,0) ≤ K. (18)

It is easy to show that to obtain the approximation, it suffices to perform column-wise hard

thresholding: letρ be theK th largestℓ2-norm among the columns ofX. Then our approximation

algorithm isS(X,K) = XSK = [xSK,1 . . . xSK,N ], where

xSK,n =

xn ‖xn‖2 ≥ ρ,

0 ‖xn‖2 < ρ,

for each1 ≤ j ≤ J and1 ≤ n ≤ N . Alternatively, a recursive approximation algorithm can be

obtained by sorting the columns ofX by their ℓ2 norms, and then selecting the largest columns.

The complexity of this sorting process isO (NJ +N logN).

C. Block-compressible signals

The approximation class under the block-compressible model corresponds to signals with

blocks whoseℓ2 norm has a power-law decay rate.

30

Definition 13: We define the set ofs-block compressible signals as

Ss = X = [x1 . . . xN ] ∈ RJ×N s.t. ‖xI(i)‖2 ≤ Si−s−1/2, 1 ≤ i ≤ N, S <∞,

whereI indexes the sorted column norms.

We say thatX is an s-block compressible signal ifX ∈ Ss. For such signals, we have‖X −XK‖(2,2) = σSK

(x) ≤ S1K−s, and‖X−XK‖(2,1) ≤ S2K

1/2−s. Note that the block-compressible

signal model does not impart a structure to the decay of the signal coefficients, so that the sets

Rj,K are equal for all values ofj; due to this property, the(δSK, s)-RAmP is implied by the

SK-RIP. Taking this into account, we can derive the following result from [10], which is proven

similarly to Theorem 4.

Theorem 6:Let x be a signal from modelS, and lety = Φx + n be a set of noisy CS

measurements. IfΦ has theS4K-RIP with δS4

K≤ 0.1, then the estimate obtained from iterationi

of block-based CoSaMP, using the approximation algorithm (18), satisfies

‖x− xi‖2 ≤ 2−i‖x‖2 + 20

(‖X −XS

K‖(2,2) +1√K‖X −XS

K‖(2,1) + ‖n‖2).

Thus, the algorithm provides a recovered signal of similar quality to approximations ofX

by a small number of nonzero columns. When the signalx is K-block sparse, we have that

||X − XSK‖(2,2) = ||X − XS

K‖(2,1) = 0, obtaining the same result as Theorem 4, save for a

constant factor.

D. Stable block-based recovery from compressive measurements

Since Theorem 6 poses the same requirement on the measurement matrix Φ for sparse and

compressible signals, the same number of measurementsM is required to provide performance

guarantees for block-sparse and block-compressible signals. The classSK containsS =(

NK

)

subspaces of dimensionJK. Thus, a subgaussian random matrix has theSK-RIP property with

constantδSKand probability1− e−t if the number of measurements obeys

M ≥ 2

cδ2SK

(K

(ln

2N

K+ J ln

12

δSK

)+ t

). (19)

31

The first term in this bound matches the order of the bound for conventional CS, while the

second term introduces a linear dependence on the size of theblock J . This shows that the

number of measurements required for robust recovery scalesasM = O (KJ +K log(N/K)),

which is a substantial improvement over theM = O (JK log(N/K)) that would be required by

conventional CS recovery methods. When the size of the blockJ is larger thanlog(N/K), then

this term becomesO (KJ); that is, it is linear on the total sparsity of the block-sparse signal.

We note in passing that the bound on the number of measurements (19) assumes a dense

subgaussian measurement matrix, while the measurement matrices used in [9] have a block-

diagonal. structure. To obtain measurements from anM × JN dense matrix in a distributed

setting, it suffices to partition the matrix intoJ pieces of sizeM × N and calculate the CS

measurements at each sensor with a corresponding matrix; these individual measurements are

then summed to obtain the complete measurement vector. For large J , (19) implies that the

total number of measurements required for recovery of the signal ensemble is lower than the

bound for the case where each signal recovery is performed independently for each signal (M

= O (JK log(N/K))).

E. Experiments

We conducted several numerical experiments comparing model-based recovery to CoSaMP

in the context of block-sparse signals. We employ the model-based CoSaMP recovery of

Algorithm 1 with the block-based approximation algorithm (18) in all cases. For brevity, we

exclude a thorough comparison of our model-based algorithmwith ℓ1-based optimization and

defer it to future work. In practice, we observed that our algorithm performs several times faster

than convex optimization-based procedures.

Figure 6 illustrates anN = 4096 signal that exhibits block sparsity, and its recovered version

using CoSaMP and model-based recovery. The block sizeJ = 64 and there wereK = 6 active

blocks in the signal. We observe the clear advantage of usingthe block-sparsity model in signal

recovery.

We now consider block-compressible signals. An example recovery is illustrated in Figure 7.

32

(a) original block-sparse signal (b) CoSaMP (c) model-based recovery

(RMSE = 0.723) (RMSE = 0.015)

Fig. 6. Example performance of model-based signal recovery for a block-sparse signal. (a) Example block-

compressible signal of lengthN = 4096 with K = 6 nonzero blocks of sizeJ = 64. Recovered signal from

M = 960 measurements using (b) conventional CoSaMP recovery and (c) block-based recovery.

In this case, theℓ2-norms of the blocks decay according to a power law, as described

above. Again, the number of measurements is far below the minimum number required to

guarantee stable recovery through conventional CS recovery. However, enforcing the model in the

approximation process results in a solution that is very close to the best 5-block approximation

of the signal.

Figure 8 indicates the decay in recovery error as a function of the numbers of measurements

for CoSaMP and model-based recovery. We generated sample block-sparse signals as follows:

we randomly selected a set ofK blocks, each of sizeJ , and endow them with coefficients that

follow an i.i.d. Gaussian distribution. Each sample point in the curves is generated by performing

200 trials of the corresponding algorithm. As in the connected wavelet-tree case, we observe

clear gains using model-based recovery, particularly for low-measurement regimes; CoSaMP

matches model-based recovery only forM ≥ 5K.

VII. CONCLUSIONS

In this paper, we have aimed to demonstrate that there are significant performance gains

to be made by exploiting more realistic and richer signal models beyond the simplistic sparse

and compressible models that dominate the CS literature. Building on the unions of subspaces

results of [5] and the proof machinery of [10], we have taken some of the first steps towards

33

(a) signal (b) best 5-block approximation

(RMSE = 0.116)

(c) CoSaMP (d) model-based recovery

(RMSE = 0.711) (RMSE = 0.195)

Fig. 7. Example performance of model-based signal recovery for block-compressible signals. (a) Example block-

compressible signal, lengthN = 1024. (b) Best block-based approximation withK = 5 blocks. Recovered signal

from M = 200 measurements using both (c) conventional CoSaMP recovery and (d) block-based recovery.

what promises to be a general theory for model-based CS by introducing the notion of a model-

compressible signal and the associated restricted amplification property (RAmP) condition it

imposes on the measurement matrixΦ.

For the volumes of natural and manmade signals and images that are wavelet-sparse

or compressible, our tree-based CoSaMP and IHT algorithms offer performance that signif-

icantly exceeds today’s state-of-the-art while requiringonly M = O (K) rather thanM =

O (K log(N/K)) random measurements. For block-sparse signals and signal ensembles, our

block-based CoSaMP and IHT algorithms offer not only excellent performance but also require

just M = O (JK) measurements, whereJK is the signal sparsity. Furthermore, block-based

recovery can recovery signal ensembles using fewer measurements than the number required

34

1 2 3 4 50

2

4

6

8

10

12

14

16

M/KA

vera

ge n

orm

aliz

ed e

rror

mag

nitu

de Model−based recoveryCoSaMP

Fig. 8. Performance of CoSaMP and block-based recovery on a class ofblock-sparse signals as a function ofM/K.

Standard CS recovery does not match the performance of block-based recovery untilM = 5K.

when each signal is recovered independently.

There are many avenues for future work on model-based CS. We have only considered the

recovery of signals from models that can be geometrically described as a union of subspaces;

possible extensions include other, more complex geometries (for example, high-dimensional

polytopes, nonlinear manifolds.) We also expect that the core of our proposed algorithms — a

model-enforcing approximation step — can be integrated into other iterative algorithms, such

as relaxedℓ1-norm minimization methods. Furthermore, our framework will benefit from the

formulation of new signal models that are endowed with efficient model-based approximation

algorithms.

APPENDIX I

PROOF OFTHEOREM 2

To prove this theorem, we will study the distribution of the maximum singular value of a

submatrixΦT of a matrix with i.i.d. Gaussian entriesΦ corresponding to the columns indexed

by T . From this we obtain the probability that RAmP does not hold for a fixed supportT . We

will then evaluate the same probability for all supportsT of elements ofRj,K , where the desired

bound on the amplification is dependent on the value ofj. This gives us the probability that the

35

RAmP does not hold for a given residual subspace setRj,K . We fix the probability of failure

on each of these sets; we then obtain probability that the matrix Φ does not have the RAmP

using a union bound. We end by obtaining conditions on the number of rowsM of Φ to obtain

a desired probability of failure.

We begin from the following concentration of measure for thelargest singular value of a

M ×K submatrixΦT , |T | = K, of anM ×N matrix Φ with i.i.d. subgaussian entries that are

properly normalized [18, 36, 37]:

P

(σmax(ΦT ) > 1 +

√K

M+ τ + β

)≤ e−Mτ2/2.

For large enoughM , β ≪ 1; thus we ignore this small constant in the sequel. By letting

τ = jr√

1 + ǫK − 1−√

KM

(with the appropriate value ofj for T ), we obtain

P(σmax(ΦT ) > jr

√1 + ǫK

)≤ e

−M2

“

jr√

1+ǫK−1−√

KM

”2

.

We use a union bound over all possibleRj supports foru ∈ Rj,K to obtain the probability that

Φ does not amplify the norm ofu by more thanjr√

1 + ǫK :

P(‖Φu‖2 >

(jr√

1 + ǫK)‖u‖2 ∀ u ∈ Rj,K

)≤ Rje

− 12(

√M(jr

√1+ǫK−1)−

√K)

2

.

Bound the right hand side by a constantµ; this requires

Rj ≤ e12(

√M(jr

√1+ǫK−1)−

√K)

2

µ (20)

for eachj. We use another union bound among the residual subspacesRj.K to measure the

probability that the RAmP does not hold:

P(‖Φu‖2 >

(jr√

1 + ǫK)‖u‖2 ∀ u ∈ Rj,K , ∀ j, 1 ≤ j ≤ ⌈N/K⌉

)≤⌈N

K

⌉µ.

To bound this probability bye−t, we needµ = KNe−t; plugging this into (20), we obtain

Rj ≤ e12(

√M(jr

√1+ǫK−1)−

√K)

2K

Ne−t

for eachj. Simplifying, we obtain that forΦ to posess the RAmP with probability1− e−t, the

following must hold for allj:

M ≥ 1(jr√

1 + ǫK − 1)2

(√

2

(lnRjN

K+ t

)+√K

)2

. (21)

36

Since (√a +√b)2 ≤ 2a + 2b for a, b > 0, then the hypothesis (12) implies (21), proving the

theorem.

APPENDIX II

PROOF OFTHEOREM 5

In this proof, we denoteM(x,K) = xK for brevity. To bound‖Φ(x−xK )‖2, we writex as

x = xK +

⌈N/K⌉∑

j=2

xTj,

where

xTj= xjK − x(j−1)K , j = 2, . . . , ⌈N/K⌉

is the difference between the bestjK model approximation and the best(j − 1)K model

approximation. Additionally, each piecexTj∈ Rj,K . Therefore, sinceΦ satisifes the(ǫK , s− 1)

RAmP, we obtain

‖Φ(x− xK)‖2 =

∥∥∥∥∥∥Φ

⌈N/K⌉∑

j=2

xTj

∥∥∥∥∥∥2

≤⌈N/K⌉∑

j=2

‖ΦxTj‖2 ≤

⌈N/K⌉∑

j=2

√1 + ǫKj

s−1‖xTj‖2. (22)

Sincex ∈Ms, the norm of each piece can be bounded as

‖xTj‖2 = ‖xjK − x(j−1)K‖2 ≤ ‖x− x(j−1)K‖2 + ‖x− xjK‖2 ≤ |x|MsK

−s((j − 1)−s + j−s

).

Applying this bound in (22), we obtain

‖Φ(x− xK)‖2 ≤√

1 + ǫK

⌈N/K⌉∑

j=2

js−1‖xTj‖2,

≤√

1 + ǫK |x|MsK−s

⌈N/K⌉∑

j=2

js−1((j − 1)−s + j−s),

≤√

1 + ǫK |x|MsK−s

⌈N/K⌉∑

j=2

j−1.

It is easy to show, using Euler-Maclaurin summations, that∑⌈N/K⌉

j=2 j−1 ≤ ln⌈N/K⌉; we then

obtain

‖Φ(x− xK)‖2 ≤√

1 + ǫKK−s ln

⌈N

K

⌉|x|Ms ,

which proves the theorem.

37

Algorithm 2 Model-Based Iterative Hard ThresholdingInputs: CS MatrixΦ, measurementsy, modelMK

Outpurs:K-sparse approximationx

initialize: x0 = 0, r = y, i = 0.

while halting criterion falsedo

i← i+ 1

b← xi−1 + ΦT r form signal estimatexi ←M(b,K) prune signal estimate according to signal modelr ← y − Φxi update measurement residual

end while

return x← xi

APPENDIX III

MODEL-BASED ITERATIVE HARD THRESHOLDING

Our proposed model-based iterative hard thresholding (IHT) is given in Algorithm 2. For

this algorithm, Theorems 4, 5, and 6 can be proven with only a few modifications:Φ must have

theM3K-RIP with δM3

K≤ 0.1, and the constant factor in the bound changes from 15 to 4 in

Theorem 4, from 35 to 10 in Theorem 5, and from 20 to 5 in Theorem6.

To illustrate the performance of the algorithm, we repeat the HeaviSineexperiment from

Figure 1. Recall thatN = 1024, and M = 80 for this example. The advantages of using

our tree-model-based approximation step (instead of mere hard thresholding) are evident from

Figure 9. In practice, we have observed that our model-basedalgorithm converges in fewer steps

than IHT and yields much more accurate results in terms of recovery error.

APPENDIX IV

PROOF OFTHEOREM 4

The proof of this theorem is identical to that of the CoSaMP algorithm in [10, Section 4.6],

and requires a set of six lemmas. The sequence of Lemmas 1–6 below are modifications of

38

(a) original (b) IHT (c) model-based IHT

(RMSE = 0.627) (RMSE = 0.080)

Fig. 9. Example performance of model-based IHT. (a) Piecewise-smooth HeaviSinetest signal, lengthN = 1024.

Signal recovered fromM = 80 measurements using both (b) standard and (c) model-based IHT recovery. Root mean-

squared error (RMSE) values are normalized with respect to theℓ2 norm of the signal.

the lemmas in [10] that are restricted to the signal model. Lemma 4 does not need any changes

from [10], so we state it without proof. The proof of Lemmas 3–6 use the properties in Lemmas 1

and 2, which are simple to prove.

Lemma 1:SupposeΦ hasM-RIP with constantδM. Let Ω be a support corresponding to

a subspace inM. Then we have the following handy bounds.

‖ΦTΩu‖2 ≤

√1 + δM‖u‖2,

‖Φ†Ωu‖2 ≤

1√1− δM

‖u‖2,

‖ΦTΩΦΩu‖2 ≤ (1 + δM)‖u‖2,

‖ΦTΩΦΩu‖2 ≥ (1− δM)‖u‖2,

‖(ΦTΩΦΩ)−1u‖2 ≤

1

1 + δM‖u‖2,

‖(ΦTΩΦΩ)−1u‖2 ≥

1

1− δM‖u‖2.

Lemma 2:SupposeΦ hasM2K-RIP with constantδM2

K. Let Ω be a support corresponding

to a subspace inMK , and letx ∈ MK . Then‖ΦTΩΦx|ΩC‖2 ≤ δM2

K‖x|ΩC‖2.

39

We begin the proof of Theorem 4 by fixing an iterationi ≥ 1 of model-based CoSaMP. We

write x = xi−1 for the signal estimate at the beginning of theith iteration. Define the signal

residuals = x − x, which implies thats ∈ M2K . We note that we can writer = y − Φx =

Φ(x− x) + n = Φs+ n.

Lemma 3: (Identification)The setΩ = supp(M2(e,K)), where e = ΦT r, identifies a

subspace inM2K , and obeys

‖s|ΩC‖2 ≤ 0.2223‖s‖2 + 2.34‖n‖2.

Proof of Lemma 3:Define the setΠ = supp(s). Let eΩ = M2(e,K) be the model-based

approximation toe with supportΩ, and similarly leteΠ be the approximation toe with support

Π. Each approximation is equal toe for the coefficients in the support, and zero elsewhere. Since

Ω is the support of the best approximation inM2K , we must have:

‖e− eΩ‖22 ≤ ‖e− eΠ‖22,N∑

n=1

(e[n]− eΩ[n])2 ≤N∑

n=1

(e[n]− eΠ[n])2,

∑

n/∈Ω

e[n]2 ≤∑

n/∈Π

e[n]2,

N∑

n=1

e[n]2 −∑

n/∈Ω

e[n]2 ≥N∑

n=1

e[n]2 −∑

n/∈Π

e[n]2,

∑

n∈Ω

e[n]2 ≥∑

n∈Π

e[n]2,

∑

n∈Ω

e[n]2 ≥∑

n∈Π

e[n]2,

∑

n∈Ω\Πe[n]2 ≥

∑

n∈Π\Ωe[n]2,

‖e|Ω\Π‖22 ≥ ‖e|Π\Ω‖22,

whereΩ \ Π denotes the set difference ofΩ andΠ. These signals are inM4K (since they arise

as the difference of two elements fromM2K); therefore, we can apply theM4

K-RIP constants

40

and Lemmas 1 and 2 to provide the following bounds on both sides (see [10] for details):

‖e|Ω\Π‖2 ≤ δM4K‖s‖2 +

√1 + δM2

K‖e‖2, (23)

‖e|Π\Ω‖2 ≥ (1− δM2K)‖s|ΩC‖2 − δM2

K‖s‖2 −

√1 + δM2

K‖e‖2. (24)

Combining (23) and (24), we obtain

‖s|ΩC‖2 ≤(δM2

K+ δM4

K)‖s‖2 + 2

√1 + δM2

K‖e‖2

1− δM2K

.

The argument is completed by noting thatδM2K≤ δM4

K≤ 0.1.

Lemma 4: (Support Merger)Let Ω be a set of at most2K indices. Then the setΛ =

Ω ∪ supp(x) contains at most3K indices, and‖x|ΛC‖2 ≤ ‖s|ΩC‖2.

Lemma 5: (Estimation)Let Λ be a support corresponding to a subspace inM3K , and define

the least squares signal estimateb by b|T = Φ†Ty, b|T C = 0. Then

‖x− b‖2 ≤ 1.112‖x|ΛC‖2 + 1.06‖n‖2.

Proof of Lemma 5:It can be shown [10] that

‖x− b‖2 ≤ ‖x|ΛC‖2 + ‖(ΦTΛΦΛ)−1ΦT

ΛΦx|ΠC‖2 + ‖Φ†Πn‖2.

SinceΛ is a support corresponding to a subspace inM3K and x ∈ MK , we use Lemmas 1

and 2 to obtain

‖x− b‖2 ≤ ‖x|ΛC‖2 +1

1− δM3K

‖ΦTΛΦx|ΠC‖2 +

1√1− δM3

K

‖n‖2,

≤(

1 +δM4

K

1− δM3K

)‖x|ΠC‖2 +

1√1− δM3

K

‖n‖2.

Finally, note thatδM3K≤ δM4

K≤ 0.1.

Lemma 6: (Pruning)The pruned approximationxi = M(b,K) is such that

‖x− xi‖2 ≤ 2‖x− b‖2.

41

Proof of Lemma 6: Sincexi is the best approximation inMK to b, andx ∈MK , we obtain

‖x− xi‖2 ≤ ‖x− b‖2 + ‖b− xi‖2 ≤ 2‖x− b‖2.

We use these lemmas in reverse sequence for the inequalitiesbelow:

‖x− xi‖2 ≤ 2‖x− b‖2,

≤ 2(1.112‖x|ΛC‖2 + 1.06‖n‖2),

≤ 2.224‖s|ΩC‖2 + 2.12‖n‖2,

≤ 2.224(0.2223‖s‖2 + 2.34‖n‖2) + 2.12‖n‖2,

≤ 0.5‖s‖2 + 7.5‖n‖2,

≤ 0.5‖x− xi−1‖2 + 7.5‖n‖2.

From the recursion onxi, we obtain‖x− xi‖2 ≤ 2−i‖x‖2 + 15‖n‖2. This completes the proof

of Theorem 4.

APPENDIX V

PROOF OFPROPOSITION1

WhenK < log2N , the number of subtrees of sizeK of a binary tree of sizeN is the

Catalan number [38]

TK,N =1

K + 1

(2K

K

)≤ (2e)K

K + 1,

using Stirling’s approximation. WhenK > log2N , we partition this count of subtrees into the

numbers of subtreestK,h of sizeK and heighth, to obtain

TK,N =

log2 N∑

h=⌊log2 K⌋+1

tK,h

42

We obtain the following asymptotic identity from [38, page 51]:

tK,h =4K+1.5

h4

∑

m≥1

[2K

h2(2πm)4 − 3(2πm)2

]e−

K(2πm)2

h2 + 4KO(e− ln2 h

)

+4KO(

ln8 h

h5

)+ 4KO

(ln8 h

h4

),

≤ 4K+2

h4

∑

m≥1

[2K

h2(2πm)4 − 3(2πm)2

]e−

K(2πm)2

h2 . (25)

We now simplify the formula slightly: we seek a bound for the sum term (which we denote

by βh for brevity):

βh =∑

m≥1

[2K

h2(2πm)4 − 3(2πm)2

]e−

K(2πm)2

h2 ≤∑

m≥1

2K

h2(2πm)4e−

K(2πm)2

h2 . (26)

Let mmax = hπ√

2K, the value ofm for which the term inside the sum (26) is maximum; this is

not necessarily an integer. Then,

βh ≤⌊mmax⌋−1∑

m=1

2K

h2(2πm)4e−

K(2πm)2

h2 +

⌈mmax⌉∑

m=⌊mmax⌋

2K

h2(2πm)4e−

K(2πm)2

h2

+∑

m≥⌈mmax⌉+1

2K

h2(2πm)4e−

K(2πm)2

h2 ,

≤∫ ⌊mmax⌋

1

2K

h2(2πx)4e−

K(2πx)2

h2 dx+

⌈mmax⌉∑

m=⌊mmax⌋

2K

h2(2πm)4e−

K(2πm)2

h2

+

∫ ∞

⌈mmax⌉

2K

h2(2πx)4e−

K(2πx)2

h2 dx,

where the second inequality comes from the fact that the series in the sum is strictly increasing

for m ≤ ⌊mmax⌋ and strictly decreasing form > ⌈mmax⌉. One of the terms in the sum can be

added to one of the integrals. If we have that

(2π ⌊mmax⌋)4e−K(2π⌊mmax⌋)2

h2 < (2π ⌈mmax⌉)4e−K(2π(⌈mmax⌉))2

h2 , (27)

then we can obtain

βh ≤∫ ⌈mmax⌉

1

2K

h2(2πx)4e−

K(2πx)2

h2 dx+2K

h2(2π ⌈mmax⌉)4e−

K(2π⌈mmax⌉)2

h2

+

∫ ∞

⌈mmax⌉

2K

h2(2πx)4e−

K(2πx)2

h2 dx.

43

When the opposite of (27) is true, we have that

βh ≤∫ ⌊mmax⌋

1

2K

h2(2πx)4e−

K(2πx)2

h2 dx+2K

h2(2π ⌊mmax⌋)4e−

K(2π⌊mmax⌋)2

h2

+

∫ ∞

⌊mmax⌋

2K

h2(2πx)4e−

K(2πx)2

h2 dx.

Since the term in the sum reaches its maximum formmax, we will have in all three cases that

βh ≤∫ ∞

1

2K

h2(2πx)4e−

K(2πx)2

h2 dx+8h2

Ke2.

We perform a change of variablesu = 2πx and defineσ = h/√

2K to obtain

βh ≤1

2π

∫ ∞

0

1

σ2u4e−u2/2σ2

dx+8h2

Ke2≤ 1

2σ√

2π

∫ ∞

−∞

1√2πσ

u4e−u2/2σ2

dx+8h2

Ke2.

Using the formula for the fourth central moment of a Gaussiandistribution:∫ ∞

−∞

1√2πσ

u4e−u2/2σ2

dx = 3σ4,

we obtain

βh ≤3σ3

2√

2π+

8h2

Ke2=

3h3

8√πK3

+8h2

Ke2.

Thus, (25) simplifies to

tK,h ≤4K

K

(6

h√πK

+128

h2e2

).

Correspondingly,TK,N becomes

TK,N ≤log2 N∑

h=⌊log2 K⌋+1

4K

K

(6

h√πK

+128

h2e2

),

≤ 4K

K

6√πK

log2 N∑

h=⌊log2 K⌋+1

1

h+

128

e2

log2 N∑

h=⌊log2 K⌋+1

128

h2e2

.

It is easy to show, using Euler-Maclaurin summations, that

b∑

j=a

j−1 ≤ lnb

a− 1and

b∑

j=a

j−1 ≤ 1

a− 1;

we then obtain

TK,N ≤ 4K

K

(6√πK

lnlog2N

⌊log2K⌋+

128

e2⌊log2K⌋

)≤ 4K+4

Ke2⌊log2K⌋≤ 4K+4

Ke2.

This proves the proposition.

44

APPENDIX VI

PROOF OFPROPOSITION3

We wish to find the value of the bound (12) for the subspace count given in (17). We obtain

M ≥ max1≤j≤⌈N/K⌉Mj , whereMj follows one of these three regimes:

Mj =

1

(jr√

1+ǫK−1)2

(2K + 4 ln (2e)K(2j+1)N

K(Kj+1)(Kj+K+1)+ 2t

)if j <

⌊log2 N

K

⌋,

1

(jr√

1+ǫK−1)2

(2K + 4 ln 2(3j+2)K+8ejKN

K(Kj+1)(Kj+K+1)e2 + 2t)

if j =⌊

log2 NK

⌋,

1

(jr√

1+ǫK−1)2

(2K + 4 ln 4(2j+1)K+8N

K3j(j+1)e4 + 2t)

if j >⌊

log2 NK

⌋.

We separate the terms that are linear onK and j, and obtain

Mj =

1

(jr√

1+ǫK−1)2

(K(3 + 4 ln 2) + 8Kj(1 + ln 2) + 4 ln N

K(Kj+1)(Kj+K+1) + 2t)

if j <⌊

log2

N

K

⌋,

1

(jr√

1+ǫK−1)2

(2K(1 + 4 ln 2) + 4Kj(1 + ln 8) + 4 ln 256N

K(Kj+1)(Kj+K+1)e2 + 2t)

if j =⌊

log2

N

K

⌋,

1

(jr√

1+ǫK−1)2

(2K(1 + 4 ln 2) + 16Kj ln 2 + 4 ln 65536N

K3j(j+1)e4 + 2t)

if j >⌊

log2

N

K

⌋,

=

1

(js−0.5√

1+ǫK−j−0.5)2

(8K(1 + ln 2) + K(3+4 ln 2)

j+ 4

jln N

K(Kj+1)(Kj+K+1) + 2tj

)if j <

⌊log

2N

K

⌋,

1

(js−0.5√

1+ǫK−j−0.5)2

(4K(1 + ln 8) + 2K(1+4 ln 2)

j+ 4

jln 256N

K(Kj+1)(Kj+K+1)e2 + 2tj

)if j =

⌊log

2N

K

⌋,

1

(js−0.5√

1+ǫK−j−0.5)2

(16K ln 2 + 2K(1+4 ln 2)

j+ 4

jln 65536N

K3j(j+1)e4 + 2tj

)if j >

⌊log

2N

K

⌋.

The sequencesMj⌊log2 N

K ⌋−1

j=1 and Mj⌈NK ⌉

j=⌊ log2 NK ⌋+1

are decreasing sequences, since the

numerators are decreasing sequences and the denominator isan increasing sequence whenever

s > 0.5. WhenK ≤ log2N , we have

M ≥ max

(1

(√1 + ǫK − 1

)2(K(11 + 12 ln 2) + 4 ln

N

K(K + 1)(2K + 1)+ 2t

),

4K(1 + ln 8) +2K(1+4 ln 2)+4 ln 256N

K(log2 N+1)(log2 N+K+1)e2+2t

log2 NK(

log2 NK

s−0.5√1 + ǫK − log2 N

K

−0.5)2 ,

16K ln 2 +2K(1+4 ln 2)+4 ln 65536N

K(log2 N+K)(log2 N+2K)e4+2t

log2 NK

+1((

log2 NK

+ 1)s−0.5√

1 + ǫK −(

log2 NK

+ 1)−0.5

)2

.

45

These three terms have sequentially smaller numerators andsequentially larger denominators,

resulting in

M ≥ 1(√

1 + ǫK − 1)2(K(11 + 12 ln 2) + 4 ln

N

K(K + 1)(2K + 1)+ 2t

).

WhenK > log2N , the first two regimes ofMj are nonexistent, and so we have

M ≥ 1(√

1 + ǫK − 1)2(

2K(1 + 12 ln 2) + 4 ln32768N

K3e4+ 2t

).

This completes the proof of Proposition 3.

ACKNOWLEDGEMENTS

We thank Petros Boufounos and Mark Davenport for helpful discussions.

REFERENCES

[1] S. Mallat, A Wavelet Tour of Signal Processing. San Diego: Academic Press, 1999.

[2] D. L. Donoho, “Compressed sensing,”IEEE Trans. Info. Theory, vol. 52, pp. 1289–1306, Sept. 2006.

[3] E. J. Candes, “Compressive sampling,” inProc. International Congress of Mathematicians, vol. 3, (Madrid, Spain),

pp. 1433–1452, 2006.

[4] R. G. Baraniuk, “Compressive sensing,”IEEE Signal Processing Mag., vol. 24, no. 4, pp. 118–120, 124, July 2007.

[5] T. Blumensath and M. E. Davies, “Sampling theorems for signals from the union of linear subspaces,”IEEE Trans. Info.

Theory, July 2007. Submitted.

[6] Y. M. Lu and M. N. Do, “Sampling signals from a union of subspaces,”IEEE Signal Processing Mag., vol. 25, pp. 41–47,

Mar. 2008.

[7] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of block-sparse signals with an optimal number of

measurements,” Mar. 2008. Preprint.

[8] Y. Eldar and M. Mishali, “Robust recovery of signals froma union of subspaces,” 2008. Preprint.

[9] D. Baron, M. F. Duarte, S. Sarvotham, M. B. Wakin, and R. G.Baraniuk, “Distributed compressed sensing,” 2005. Preprint.

[10] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,”Applied and

Computational Harmonic Analysis, June 2008. To be published.

[11] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” July 2008. Preprint.

[12] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based statistical signal processing using Hidden Markov

Models,” IEEE Trans. Signal Processing, vol. 46, pp. 886–902, Apr. 1998.

[13] R. G. Baraniuk, “Fast reconstruction from incoherent projections.” Workshop on Sparse Representations in Redundant

Systems, May 2005.

[14] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk, “Fast reconstruction of piecewise smooth signals from random projections,”

in Proc. SPARS05, (Rennes, France), Nov. 2005.

46

[15] C. La and M. N. Do, “Tree-based orthogonal matching pursuit algorithm for signal reconstruction,” inIEEE International

Conference on Image Processing (ICIP), (Atlanta, GA), pp. 1277–1280, Oct. 2006.

[16] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk, “Wavelet-domain compressive signal reconstruction using a hidden Markov

tree model,” inIEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), (Las Vegas, NV), pp. 5137–5140,

April 2008.

[17] K. Lee and Y. Bresler, “Selecting good Fourier measurements for compressed sensing.” SIAM Conference on Imaging

Science, July 2008.

[18] S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann, “Uniform uncertainty principle for Bernoulli and subgaussian

ensembles,”Constructive Approximation, Feb. 2008. To be published.

[19] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic Decomposition by Basis Pursuit,”SIAM Journal on Scientific

Computing, vol. 20, p. 33, 1998.

[20] J. Haupt and R. Nowak, “Signal reconstruction from noisy random projections,”IEEE Trans. Info. Theory, vol. 52,

pp. 4036–4048, Sept. 2006.

[21] E. J. Candes and T. Tao, “The Dantzig selector: Statistical estimation whenp is much larger thann,” Annals of Statistics,

vol. 35, pp. 2313–2351, Dec. 2007.

[22] J. Tropp and A. C. Gilbert, “Signal recovery from partial information via orthogonal matching pursuit,”IEEE Trans. Info.

Theory, vol. 53, pp. 4655–4666, Dec. 2007.

[23] D. L. Donoho, I. Drori, Y. Tsaig, and J. L. Starck, “Sparse solution of underdetermined linear equations by stagewise

orthogonal matching pursuit,” 2006. Preprint.

[24] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing: Closing the gap between performance and

complexity,” Mar. 2008. Preprint.

[25] E. J. Candes, “The restricted isometry property and its implications for compressed sensing,”Compte Rendus de l’Academie

des Sciences, Series I, vol. 346, pp. 589–592, May 2008.

[26] R. G. Baraniuk and D. L. Jones, “A signal-dependent time-frequency representation: Fast algorithm for optimal kernel

design,” IEEE Trans. Signal Processing, vol. 42, pp. 134–146, Jan. 1994.

[27] R. Baraniuk, “Optimal tree approximation with wavelets,” in Wavelet Applications in Signal and Image Processing VII,

vol. 3813 ofProc. SPIE, (Denver, CO), pp. 196–207, July 1999.

[28] R. G. Baraniuk, R. A. DeVore, G. Kyriazis, and X. M. Yu, “Near best tree approximation,”Advances in Computational

Mathematics, vol. 16, pp. 357–373, May 2002.

[29] J. K. Romberg, H. Choi, and R. G. Baraniuk, “Bayesian tree-structured image modeling using wavelet-domain Hidden

Markov Models,” IEEE Trans. Image Processing, vol. 10, pp. 1056–1068, July 2001.

[30] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using a scale mixture of Gaussians in the

wavelet domain,”IEEE Trans. Image Processing, vol. 12, pp. 1338–1351, Nov. 2003.

[31] J. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,”IEEE Trans. Signal Processing, vol. 41,

pp. 3445–3462, Dec. 1993.

[32] A. Cohen, W. Dahmen, I. Daubechies, and R. A. DeVore, “Tree approximation and optimal encoding,”Applied and

Computational Harmonic Analysis, vol. 11, pp. 192–226, Sept. 2001.

[33] D. Donoho, “CART and best ortho-basis: A connection,”Annals of Statistics, vol. 25, pp. 1870–1911, Oct. 1997.

47

[34] M. B. Wakin, S. Sarvotham, M. F. Duarte, D. Baron, and R. G. Baraniuk, “Recovery of jointly sparse signals from few

random projections,” inProc. Workshop on Neural Info. Proc. Sys. (NIPS), (Vancouver), Nov. 2005.

[35] J. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit,”

Signal Processing, vol. 86, pp. 572–588, Apr. 2006.

[36] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Info. Theory, vol. 51, pp. 4203–4215, Dec.

2005.

[37] M. Ledoux,The Concentration of Measure Phenomenon. American Mathematical Society, 2001.

[38] G. G. Brown and B. O. Shubert, “On random binary trees,”Mathematics of Operations Research, vol. 9, pp. 43–65, Feb.

1984.

48

model-based compressive sensing - rice universityduarte/images/modelcs082008.pdf · model-based...

Documents