fundamental limits of recovering tree sparse vectors from noisy linear measurements

Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements

a`(1)

a`(2)

a`(5)

a`(3)

a`(4) a`(6) a`(7)EE-8500 Seminar

Akshay Soni University of Minnesota [email protected]

(joint work with J. Haupt)

Jarvis Haupt University of Minnesota

Department of Electrical and Computer Engineering

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Supported by

Data Everywhere

Integral to Science, Engineering, Discovery

Inevitable Data Deluge!

The Economist, February 2010

Novel Sensing Architectures

20 % sample

40 % sample

original

Single Pixel Images -‐-‐ h0p://dsp.rice.edu/cscamera

9me

Key Idea – Sparsity

frequency

Many signals exhibit sparsity in the canonical or ‘pixel basis’

Communication signals often have sparse frequency content

Natural images often have sparse wavelet representation DWT

DFT

-- Background --

Sparsity and Structured Sparsity

A Model for Sparse Signals

Union of Subspace Model

A Sparse Signal Model

signal support set

number of nonzero signal components


signal support set



signal support set


Signals of interest are vectors x 2 Rn

Structured Sparsity

Tree Sparsity in Wavelets Grid Sparsity in Networks Graph Sparsity – background subtraction

9

(a) Wavelet Tree Sparsity (b) Background Subtracted Image: Graph Sparsity

Figure 1.3: Structured sparsity. (a) The brain image has tree sparsity after wavelet transfor-mation; (b) The background subtracted image has graph sparsity.

From above introductions, we know that there exists literature on structured sparsity, with

empirical evidence showing that one can achieve better performance by imposing additional

structures. However, none of the previous work was able to establish a general theoretical

framework for structured sparsity that can quantify its effectiveness. The goal of this thesis

is to develop such a general theory that addresses the following issues, where we pay special

attention to the benefit of structured sparsity over the standard non-structured sparsity:

• Quantifying structured sparsity;

• The minimal number of measurements required in compressive sensing;

• Estimation accuracy under stochastic noise;

• An efficient algorithm that can solve a wide class of structured sparsity problems.

1.5 Organization

We now outline the theoretical framework and application results which are considered by

subsequent thesis chapters. The brief introductory paragraphs provide more detailed outlines

for each chapters.

Chapter 2 investigates the benefits of the sparsity with group structure. As we know, Group

Lasso is a well-known algorithm for group sparsity in statistical learning. A theory is developed

in this chapter for Group Lasso using a concept called strong group sparsity. We theoretically

prove that group Lasso is superior to standard Lasso for strongly group-sparse data. This

provides a convincing theoretical justification for using group sparsity regularization when the

underlying group structure is consistent with the data. Moreover, our theory can also help

predict some limitations of the group Lasso formulation. We conduct a series of simulated

experiments to validate the benefits and limitations of the group sparsity.

• locations of nonzeros are inter-dependent

• structure knowledge can be used during sensing, inference or both

Structured Sparsity

Our focus – Tree Structured Sparsity!

Tree Sparsity in Wavelets Grid Sparsity in Networks Graph Sparsity – background subtraction

9

(a) Wavelet Tree Sparsity (b) Background Subtracted Image: Graph Sparsity

Figure 1.3: Structured sparsity. (a) The brain image has tree sparsity after wavelet transfor-mation; (b) The background subtracted image has graph sparsity.

From above introductions, we know that there exists literature on structured sparsity, with

empirical evidence showing that one can achieve better performance by imposing additional

structures. However, none of the previous work was able to establish a general theoretical

framework for structured sparsity that can quantify its effectiveness. The goal of this thesis

is to develop such a general theory that addresses the following issues, where we pay special

attention to the benefit of structured sparsity over the standard non-structured sparsity:

• Quantifying structured sparsity;

• The minimal number of measurements required in compressive sensing;

• Estimation accuracy under stochastic noise;

• An efficient algorithm that can solve a wide class of structured sparsity problems.

1.5 Organization

We now outline the theoretical framework and application results which are considered by

subsequent thesis chapters. The brief introductory paragraphs provide more detailed outlines

for each chapters.

Chapter 2 investigates the benefits of the sparsity with group structure. As we know, Group

Lasso is a well-known algorithm for group sparsity in statistical learning. A theory is developed

in this chapter for Group Lasso using a concept called strong group sparsity. We theoretically

prove that group Lasso is superior to standard Lasso for strongly group-sparse data. This

provides a convincing theoretical justification for using group sparsity regularization when the

underlying group structure is consistent with the data. Moreover, our theory can also help

predict some limitations of the group Lasso formulation. We conduct a series of simulated

experiments to validate the benefits and limitations of the group sparsity.

• locations of nonzeros are inter-dependent

• structure knowledge can be used during sensing, inference or both

Tree Structured Sparsity 1

52

3 4 6 7

Characteristics of tree structure

1 2 3 4 5 6 7

Tree Structured Sparsity – Why?

Wavelets!

•  Tree sparsity naturally arises in the wavelet coefficients of many signals •  for e.g. natural images

•  Several prior efforts that examined wavelet tree structure specialized sensing techniques •  for e.g. in dynamic MRI [*] and compressive

imaging [**]

•  Previous work was either experimental or analyzed only in noise-free settings

[*] L. P. Panych and F. A. Jolesz, “A dynamically adap9ve imaging algorithm for wavelet-‐encoded MRI,” Magne9c Resonance in Medicine, vol. 32, no. 6, pp. 738–748, 1994. [**] M. W. Seeger and H. Nickisch, “Compressed sensing and Bayesian experimental design,” in Proc. ICML, 2008, pp. 912–919. [**] S. Deutsch, A. Averbuch, and S. Dekel, “Adap9ve compressed image sensing based on wavelet modeling and direct sampling,” in Proc. Intl. Conf on Sampling Theory and Applica9ons, 2009.

-- Sensing Sparse Signals --

Noisy Linear Measurement Model

Sensing Strategies

Sensing Strategies

Non-Adaptive Sensing Adaptive Sensing

• j-th measurement vector aj is a function of {al, yl}j�1l=1

for each j = 2, 3, . . . ,m.

Measurement vectors

y

y1y2

yj

ym

Exact Support Recovery (ESR)

1 2 3 4 5 6 7

so that |xi| � µ > 0, i 2 S,

Task of Interest:

Primary questions:

Exact Support Recovery (ESR)

1 2 3 4 5 6 7

so that |xi| � µ > 0, i 2 S,

Task of Interest:

-- Adaptive Sensing of Tree-Sparse Signals --

A Simple Algorithm with Guarantees

Few Tree Specifics

•  Signal components are coefficients in an

orthonormal representation (canonical basis without loss of generality)

•  We consider binary trees (all results may be

extended to trees with any degree)

1

52

3 4 6 7

Tree Structured Adaptive Support Recovery

1

5 2

3 4 6 7


1

5 2

3 4 6 7

Q[1] = {5}

y5 = e

T5 x+ w

S S [ {5}Q {6, 7} [ Q\{5}

Q[1] = {6} S = {1, 5}

y6 = e

T6 x+ w

suppose |y5| > ⌧

suppose |y6| < ⌧

Q Q\{6}Q[1] = {7}

S = {1, 5}

y7 = e

T7 x+ w

suppose |y7| < ⌧

Q Q\{7}Q[1] = {;}

S = {1, 5}


1

5 2

3 4 6 7

Q[1] = {5}

y5 = e

T5 x+ w

S S [ {5}Q {6, 7} [ Q\{5}

Q[1] = {6} S = {1, 5}

y6 = e

T6 x+ w

suppose |y5| > ⌧

suppose |y6| < ⌧

Q Q\{6}Q[1] = {7}

S = {1, 5}

y7 = e

T7 x+ w

suppose |y7| < ⌧

Q Q\{7}Q[1] = {;}

S = {1, 5}

(can also measure each location r � 1 times

and average to reduce e↵ective noise)

Theorem (2011 & 2013): AS & J. Haupt


1

5 2

3 4 6 7

Q[1] = {5}

y5 = e

T5 x+ w

S S [ {5}Q {6, 7} [ Q\{5}

Q[1] = {6} S = {1, 5}

y6 = e

T6 x+ w

suppose |y5| > ⌧

suppose |y6| < ⌧

Q Q\{6}Q[1] = {7}

S = {1, 5}

y7 = e

T7 x+ w

suppose |y7| < ⌧

Q Q\{7}Q[1] = {;}

S = {1, 5}

Choose any � 2 (0, 1) and set ⌧ =

p2�

2log(4k/�). If the signal x being acquired

by our procedure is k-tree sparse, and the nonzero components of x satisfy

|xi| �

s

24

1 + log

✓4

�

◆�s

�

2

✓k

m

◆log k,

for every i 2 S(x), then with probability at least 1 � �, a “repeated measure-

ment” variant of algorithm to the left that acquires r measurements at each

observed location terminates after collecting m r(2k+ 1) measurements, and

produces support estimate

ˆS satisfying

ˆS = S(x)

Question: Can any other “smart” scheme recover support of a tree-sparse signal having “significantly” smaller magnitude? i.e., is this the best one can hope for?



1

5 2

3 4 6 7

Q[1] = {5}

y5 = e

T5 x+ w

S S [ {5}Q {6, 7} [ Q\{5}

Q[1] = {6} S = {1, 5}

y6 = e

T6 x+ w

suppose |y5| > ⌧

suppose |y6| < ⌧

Q Q\{6}Q[1] = {7}

S = {1, 5}

y7 = e

T7 x+ w

suppose |y7| < ⌧

Q Q\{7}Q[1] = {;}

S = {1, 5}


p2�



|xi| �

s

24

1 + log

✓4

�

◆�s

�

2

✓k

m

◆log k,





ˆS satisfying

ˆS = S(x)

-- Our Investigation in Context --

Fundamental Limits for ESR

The Big Picture: Minimum Signal Amplitudes for ESR

Let’s identify necessary conditions for ESR in each case…

Non-Adaptive Adaptive


Unstructured

Unstructured

Tree Sparse

Tree Sparse



Unstructured

Unstructured

Tree Sparse

Tree Sparse

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

[*] S. Aeron, V. Saligrama, and M. Zhao, "Informa9on Theore9c Bounds for Compressed Sensing," IEEE Transac9ons on Informa9on Theory, vol.56, no.10, pp.5111-‐5130, 2010

[*] M. J. Wainwright, ”Sharp thresholds for high-‐dimensional and noisy sparsity recovery using l1-‐constrained quadra9c programming (lasso), " IEEE Transac9ons on Informa9on Theory, vol.55, no.5, pp.2183-‐2202, 2009

[*] M. J. Wainwright, ”Informa9on-‐theore9c limita9ons on sparsity recovery in the high-‐dimensional and noisy sehng, " IEEE Transac9ons on Informa9on Theory, vol.55, no.12, 2009 [*] W. Wang, M. J. Wainwright and K. Ramchandran, ”Informa9on-‐theore9c limits on sparse signal recovery: Dense versus sparse measurement matrices, " IEEE Transac9ons on Informa9on Theory, vol.56, no.6, pp.2967-‐2979, 2010

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

[*] S. Aeron, V. Saligrama, and M. Zhao, "Informa9on Theore9c Bounds for Compressed Sensing," IEEE Transac9ons on Informa9on Theory, vol.56, no.10, pp.5111-‐5130, 2010

[*] M. J. Wainwright, ”Sharp thresholds for high-‐dimensional and noisy sparsity recovery using l1-‐constrained quadra9c programming (lasso), " IEEE Transac9ons on Informa9on Theory, vol.55, no.5, pp.2183-‐2202, 2009

[*] M. J. Wainwright, ”Informa9on-‐theore9c limita9ons on sparsity recovery in the high-‐dimensional and noisy sehng, " IEEE Transac9ons on Informa9on Theory, vol.55, no.12, 2009 [*] W. Wang, M. J. Wainwright and K. Ramchandran, ”Informa9on-‐theore9c limits on sparse signal recovery: Dense versus sparse measurement matrices, " IEEE Transac9ons on Informa9on Theory, vol.56, no.6, pp.2967-‐2979, 2010

uncompressed or

compressed

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

[*] M. Malloy and R. Nowak, “Sequen9al analysis in high-‐dimensional mul9ple tes9ng and sparse recovery,” in Proc. IEEE Intl. Symp. on Informa9on Theory, 2011, pp. 2661-‐2665.

Adaptivity may at best improve log(n) to log(k)!

-- Problem Formulation --

Tree-Sparse Model

Signal Model:

Sensing Strategies:

Observations:

1

52

3 4 6 7

{Am,ym} : short hand for {aj , yj}mj=1

Notations:

Adaptive : aj depends on {al, yl}j�1l=1 , subject to constraint kajk22 = 1 8 j

Support estimate:

amplitude parameter (>=0) Set of all k-node

rooted sub-trees (in underlying tree)

Non�Adaptive : here Gaussian; row aj of A is independent and

aj ⇠ N (0, I/n)

Mm : class of all adaptive (or non-adaptive) sensing strategies based on m measurements

a mapping from observations ! subset of {1, 2, . . . , n}

(Maximum) Risk of a support estimator:

Element whose support is most difficult to estimate

Minimax Risk:

Our aim – quantify errors corresponding to these hard cases!

Preliminaries:

for estimators and sensing strategies M 2 M

In words, error of the best estimator when estimating the support of the “most di�cult”

If R⇤Xµ,k,M � � > 0 =) regardless of and M 2 M, we have at least one signal x 2 Xµ,k for

Note

In words, worst-case performance of when estimating the “most di�cult”

-- Non-Adaptive Tree-Structured Sensing --

Fundamental Limits

Theorem (2013): AS & J. Haupt

Non-Adaptive Tree-Structured Sensing – fundamental limits

Implications: no uniform guarantees can be made for any estimation procedure for recovering the support of tree-sparse signals when signal amplitude is “too small”.

For ESR with non-adaptive sensing a necessary condition is:

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

[*] AS and J. Haupt, “On the Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurement,” IEEE Transac9ons on Informa9on Theory, 2013 (accepted for publica9on).

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

Same necessary conditions as for adaptive + unstructured!

Structure or Adaptivity in isolation may at best improve log(n) to log(k)


Proof Idea – Non-Adaptive + Tree-Sparse Restrict to a “Smaller Set”:

Convert to a Multiple-Hypothesis testing problem:

We can get a lower bound on minimax risk over a smaller subset of signals!

minimax prob. of error for multiple hypothesis testing problem

Introduc9on to Nonparametric Es9ma9on – A.B. Tsybokov

supx2Xµ,k

Prx

( (Am,ym;M) 6= S(x)) � supx2X 0

µ,k

Prx

( (Am,ym;M) 6= S(x))For any X 0

µ,k ✓ Xµ,k,

=)

• get lower bound on pe,L using Fano’s inequality (or similar ideas)

-- Adaptive Tree-Structured Sensing --

Fundamental Limits

Theorem (2013): AS & J. Haupt

Adaptive Tree-Structured Sensing – fundamental limits

For ESR with non-adaptive sensing a necessary condition is:

Proof Idea: this problem is as hard as recovering the location of one nonzero given all other k-1 nonzero locations.

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse




Unstructured

Unstructured

Tree Sparse

Tree Sparse

Recall, for our simple tree-structured adaptive algorithm the sufficient condition for ESR was

which is only log(k) factor away from the lower bound.

We cannot do much better than the simple proposed algorithm!

µ �q�2

�km

�log k,

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

(when m > n)

Note: for adaptive + unstructured, our proof ideas can show in case of m < n, a necessary condition for ESR is

µ �q�2

�n�k+1

m

�

The Big Picture:

The Big Picture:



Unstructured

Unstructured

Tree Sparse

Tree Sparse

Related Works: [*] A. Krishnamurthy, J. Sharpnack, and A. Singh, “Recovering block-‐structured ac9va9ons using compressive measurements,” Submi0ed 2012.

Question: Can any other “smart” scheme recover support of a tree-sparse signal having “significantly” smaller magnitude?



1

5 2

3 4 6 7

Q[1] = {5}

y5 = e

T5 x+ w

S S [ {5}Q {6, 7} [ Q\{5}

Q[1] = {6} S = {1, 5}

y6 = e

T6 x+ w

suppose |y5| > ⌧

suppose |y6| < ⌧

Q Q\{6}Q[1] = {7}

S = {1, 5}

y7 = e

T7 x+ w

suppose |y7| < ⌧

Q Q\{7}Q[1] = {;}

S = {1, 5}


p2�



|xi| �

s

24

1 + log

✓4

�

◆�s

�

2

✓k

m

◆log k,





ˆS satisfying

ˆS = S(x)

Answer: No! We’re within log(k) of minimax optimal

Question: Can any other “smart” scheme recover support of a tree-sparse signal having “significantly” smaller magnitude?



1

5 2

3 4 6 7

Q[1] = {5}

y5 = e

T5 x+ w

S S [ {5}Q {6, 7} [ Q\{5}

Q[1] = {6} S = {1, 5}

y6 = e

T6 x+ w

suppose |y5| > ⌧

suppose |y6| < ⌧

Q Q\{6}Q[1] = {7}

S = {1, 5}

y7 = e

T7 x+ w

suppose |y7| < ⌧

Q Q\{7}Q[1] = {;}

S = {1, 5}


p2�



|xi| �

s

24

1 + log

✓4

�

◆�s

�

2

✓k

m

◆log k,





ˆS satisfying

ˆS = S(x)

-- Experimental Evaluation --

Simulation Setup Non-adaptive + unstructured:

Non-adaptive + tree sparsity:

Adaptive + unstructured:

Adaptive + tree sparsity:

10−1

100

101

102

0

0.2

0.4

0.6

0.8

1

Amplitude parameter µ

Pro

b.

Err

or

n=28−1

10−1

100

101

102

0

0.2

0.4

0.6

0.8

1

Amplitude parameter µ

Pro

b. E

rror

n=210−1

10−1

100

101

102

0

0.2

0.4

0.6

0.8

1

Amplitude parameter µP

rob. E

rror

n=212−1

4 orders of magnitude

[*] M. Malloy and R. Nowak, “Near-‐op9mal adap9ve compressive sensing,” in Proc. Asilomar Conf. on Signals, Systems, and Computers, 2012.

-- Next Step --

1) MSE estimation implications?

MSE estimation implications Unstructured + Non-Adaptive:

If the measurement matrix Am satisfies the norm constraint kAmk2F m, then

we have minimax MSE bound

Unstructured + Adaptive:

infbx,M2Mna

sup

x:|S(x)|=k E⇥kbx(Am,ym;M)� xk22

⇤� c �2

�nm

�k log n,

[ * ] E. J. Cand`es and M. A. Davenport, “How well can we es9mate a sparse vector?,” Applied and Computa9onal Harmonic Analysis, vol. 34, no. 2, pp. 317–323, 2013 [ ** ] E. Arias-‐Castro, E. J. Candes, and M. Davenport, “On the fundamental limits of adap9ve sensing,” Submi0ed, 2011, online at arxiv.org/abs/1111.4646.

c > 0 is a constant. [ * ]

c0 > 0 is another constant. [ ** ]





c > 0 is a constant.

infbx,M2Mna

sup


⇤� c �2

�nm

�k log n,

c0 > 0 is another constant.

Tree Structured + Non-Adaptive:

Tree Structured + Adaptive:





c > 0 is a constant.

infbx,M2Mna

sup


⇤� c �2

�nm

�k log n,

c0 > 0 is another constant.

Tree-sparse + our adaptive procedure: There exists a two-stage (support recovery followed by direct measurements)

adaptive compressive sensing procedure for k-tree sparse signals that produces,

from O(k) measurements, an estimate

ˆ

x satisfying

kˆx� xk22 = O✓�2

✓k

m

◆k

◆,

with high probability, provided the nonzero signal component amplitudes exceed

a constant times

q�2

�km

�log k.

-- Next Step --

2) Learning Adaptive Sensing Representations (LASeR)

LASeR Use Dictionary Learning and training data to learn tree-sparse representations Learning Adaptive Sensing Representations

!"#$%$%&'(#)#'

*)"+,)+"-.'*/#"0$)1'

2.#/34-'*-%0$%&'

52*-6'

52*-67'5-#"%$%&'2.#/34-'*-%0$%&'6-/"-0-%)#38%0'

(PICS) http://pics.psych.stir.ac.uk/

Example images (128⇥ 128)

Learn representation for 163 images from

Psychological Image Collection at Stirling

Learning Adaptive Sensing Representations !"#$%$%&'(#)#'

*)"+,)+"-.'*/#"0$)1'

2.#/34-'*-%0$%&'

52*-6'

52*-67'5-#"%$%&'2.#/34-'*-%0$%&'6-/"-0-%)#38%0'

(PICS) http://pics.psych.stir.ac.uk/

Example images (128⇥ 128)

Learn representation for 163 images from

Psychological Image Collection at Stirling

Wavelet Tree Sensing

PCA

CS LASSO

CS Tree LASSO

LASeR

m = 50 m = 80 m = 20

R = 128⇥12832

“Sensing Energy”

Qualitative Results

Details & examples of LASeR in ac9on: AS and J. Haupt, “Efficient adap9ve compressive sensing using sparse hierarchical learned dic9onaries,” in Proc. Asilomar Conf. on Signals, Systems and Computers, 2011, pp. 1250-‐1254.

Tree Elements Present in Sparse

Representation

Original Image

Overall Taxonomy



Unstructured

Unstructured

Tree Sparse

Tree Sparse

Sufficient condition for ESR for our algorithm:

=) nearly optimal!!

µ �q�2

�km

�log k

Overall Taxonomy



Unstructured

Unstructured

Tree Sparse

Tree Sparse

Thank You! Akshay Soni University of Minnesota [email protected]

Sufficient condition for ESR for our algorithm:

=) nearly optimal!!

µ �q�2

�km

�log k

fundamental limits of recovering tree sparse vectors from noisy linear measurements

Engineering

benet of structured

focus tree structured

signals exhibit sparsity

graph sparsity figure

wavelets grid sparsity

adaptive sensing of

key idea sparsity frequency

signal compo signals