signal processing course : theory for sparse recovery

84
Recovery Gabriel Peyré www.numerical-tours.com 1 Sparse

Upload: gabriel-peyre

Post on 21-Jun-2015

1.154 views

Category:

Documents


1 download

DESCRIPTION

Slides for a course on signal and image processing.

TRANSCRIPT

Page 2: Signal Processing Course : Theory for Sparse Recovery

Inverse problem:

K : RN0 � RP , P � N0

Example: Regularization

K Kf0f0

�1

y = Kf0 + wmeasurements

Page 3: Signal Processing Course : Theory for Sparse Recovery

Inverse problem:

observations

� = K �⇥ ⇥ RP�N

K : RN0 � RP , P � N0

f0 = �x0 sparse in dictionary � � RN0�N , N � N0.

x0 � RN f0 = �x0 � RN0 y = Kf0 + w � RP

w

Example: Regularization

Model:

K Kf0f0

coe�cients image� K

�1

y = Kf0 + wmeasurements

Page 4: Signal Processing Course : Theory for Sparse Recovery

Inverse problem:

Fidelity Regularization

minx�RN

12

||y � �x||2 + �||x||1

observations

� = K �⇥ ⇥ RP�N

K : RN0 � RP , P � N0

f0 = �x0 sparse in dictionary � � RN0�N , N � N0.

x0 � RN f0 = �x0 � RN0 y = Kf0 + w � RP

w

Example: Regularization

Model:

K

Sparse recovery: f� = �x� where x� solves

Kf0f0

coe�cients image� K

�1

y = Kf0 + wmeasurements

Page 5: Signal Processing Course : Theory for Sparse Recovery

f0 = �x0

y = �x0 + w

Recovery:

Observations:

Data:

x� ⇥ argminx�RN

12

||�x� y||2 + �||x||1

Variations and Stability

(P�(y))

Page 6: Signal Processing Course : Theory for Sparse Recovery

f0 = �x0

y = �x0 + w

Recovery:

Observations:

Data:

x� ⇥ argminx�RN

12

||�x� y||2 + �||x||1

x� � argmin�x=y

||x||1��

0+

(no noise)

Variations and Stability

(P�(y))

(P0(y))

Page 7: Signal Processing Course : Theory for Sparse Recovery

f0 = �x0

y = �x0 + w

Questions:

Recovery:

Observations:

Data:

– Behavior of x� with respect to y and �.

– Criterion to ensure ||x� � x0|| = O(||w||).

– Criterion to ensure x� = x0 when w = 0 and � = 0+.

x� ⇥ argminx�RN

12

||�x� y||2 + �||x||1

x� � argmin�x=y

||x||1��

0+

(no noise)

Variations and Stability

(P�(y))

(P0(y))

Page 8: Signal Processing Course : Theory for Sparse Recovery

�! Mapping � ! x? looks polygonal.

�! If x0 sparse and � well chosen, sign(x

?) = sign(x0).

Numerical Illustration

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

s = 3 s = 6

s = 13 s = 25

� �

y = �x0 + w, ||x0||0 = s,� 2 R50⇥200 Gaussian.

Page 9: Signal Processing Course : Theory for Sparse Recovery

Overview

• Polytope Noiseless Recovery

• Local Behavior of Sparse Regularization

• Robustness to Small Noise

• Robustness to Bounded Noise

• Compressed Sensing RIP Theory

Page 10: Signal Processing Course : Theory for Sparse Recovery

�(B�)

x0 �x0

y �� x�(y)�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

min�x=y

||x||1

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

Polytopes Approach

Page 11: Signal Processing Course : Theory for Sparse Recovery

�(B�)

x0 �x0

y �� x�(y)�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

min�x=y

||x||1

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

Polytopes Approach

(P0(y))

Page 12: Signal Processing Course : Theory for Sparse Recovery

Suppose x0 not solution, show �(x0) � int(�B�)

⇥z, such that�

�x0 = �z,||z||1 = (1� �)||x0||1.

||z + ⇥||1 � ||z|| + ||�+h||1 � (1� �)||x0||1 + ||�||1,1||h||1 < ||x0||1

For any h = �� � Im(�) such that ||h||1 <�

||�+||1,1

=� �(x0) + h � �(B�)

Proof

=�

�(x0) + h = �(z + �)

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

Page 13: Signal Processing Course : Theory for Sparse Recovery

=�

Suppose x0 not solution, show �(x0) � int(�B�)

⇥z, such that�

�x0 = �z,||z||1 = (1� �)||x0||1.

||z + ⇥||1 � ||z|| + ||�+h||1 � (1� �)||x0||1 + ||�||1,1||h||1 < ||x0||1

For any h = �� � Im(�) such that ||h||1 <�

||�+||1,1

=� �(x0) + h � �(B�)

Suppose �(x0) � int(�B�)

Then ⇥z, �x0 = (1� �)�z and ||z||1 < ||x0||1.||(1� �)z||1 < ||x0||1 so x0 is not a solution.

z

0

Proof

=�

�(x0) + h = �(z + �)

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

x0

�(B�)

Page 14: Signal Processing Course : Theory for Sparse Recovery

C(0,1,1)

K(0,1,1)

Ks =�(�isi)i � R3 \ �i � 0

� 2-D conesCs = �Ks

2-D quadrant

Basis-Pursuit Mapping in 2-D

y �� x�(y)

�1

�2�3

� = (�i)i � R2�3

Page 15: Signal Processing Course : Theory for Sparse Recovery

� = (�i)i � R3�N

�� Empty spherical caps property

RN

�j

Delaunay paving of the sphere with spherical triangles Cs

Basis-Pursuit Mapping in 3-D

y �� x�(y)

�k

Cs

�i

Page 16: Signal Processing Course : Theory for Sparse Recovery

All MostRIP

Counting faces of random polytopes:

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 17: Signal Processing Course : Theory for Sparse Recovery

Overview

• Polytope Noiseless Recovery

• Local Behavior of Sparse Regularization

• Robustness to Small Noise

• Robustness to Bounded Noise

• Compressed Sensing RIP Theory

Page 18: Signal Processing Course : Theory for Sparse Recovery

0 � �E(x�)��

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}Support of the solution:

First order condition: x� solution of P�(y)

��(�x� � y) + �s = 0 where�

sI = sign(x�I),

||sIc ||� � 1

First Order CNS Condition

��

x� ⇥ argminx�RN

E(x) =12||�x� y||2 + �||x||1

Page 19: Signal Processing Course : Theory for Sparse Recovery

0 � �E(x�)��

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}Support of the solution:

First order condition: x� solution of P�(y)

��(�x� � y) + �s = 0 where�

sI = sign(x�I),

||sIc ||� � 1

sIc =1�

��Ic(�x� � y)Note:

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

First Order CNS Condition

��

Theorem:

x� ⇥ argminx�RN

E(x) =12||�x� y||2 + �||x||1

Page 20: Signal Processing Course : Theory for Sparse Recovery

If �I has full rank:

=���(�x� � y) + �s = 0Implicit equation

Local Parameterization

x�I = �+

I y � �(��I�I)�1sI

�+I = (��I�I)�1��I

Page 21: Signal Processing Course : Theory for Sparse Recovery

If �I has full rank:

=���(�x� � y) + �s = 0Implicit equation

Given y � compute x� � compute (s, I).

x�(y)I = �+I y � �(��I�I)�1sIDefine

By construction x�(y) = x�.x�(y)Ic = 0

Local Parameterization

x�I = �+

I y � �(��I�I)�1sI

�+I = (��I�I)�1��I

Page 22: Signal Processing Course : Theory for Sparse Recovery

If �I has full rank:

=���(�x� � y) + �s = 0Implicit equation

Given y � compute x� � compute (s, I).

x�(y)I = �+I y � �(��I�I)�1sIDefine

By construction x�(y) = x�.

Theorem:

x�(y)Ic = 0

Remark: the theorem holds outside a union of hyperplanes.

12 ||x�||0=0

such that �I is full rank, I = supp(x?),

For (y,�) /2 H, let x

?be a solution of P�(y),

Local Parameterization

x�I = �+

I y � �(��I�I)�1sI

�+I = (��I�I)�1��I

2 2

2

2 2

11

1

11

for (

¯

�, y) close to (�, y), x�(y) is solution of P�(y)

Page 23: Signal Processing Course : Theory for Sparse Recovery

�! if ker(�I) 6= {0}, x?not unique.

Full Rank Condition

Lemma:There exists x

?such that

ker(�I) = {0}.

Page 24: Signal Processing Course : Theory for Sparse Recovery

�! if ker(�I) 6= {0}, x?not unique.

Proof:

Define 8 t 2 R, xt = x

? + t⌘.

If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.

Full Rank Condition

Lemma:There exists x

?such that

ker(�I) = {0}.

Page 25: Signal Processing Course : Theory for Sparse Recovery

�! if ker(�I) 6= {0}, x?not unique.

Proof:

Define 8 t 2 R, xt = x

? + t⌘.

If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.

Let t0 the smallest |t| s.t. sign(xt) 6= sign(x?).

Full Rank Condition

Lemma:There exists x

?such that

t0t

xt

0

ker(�I) = {0}.

Page 26: Signal Processing Course : Theory for Sparse Recovery

�! if ker(�I) 6= {0}, x?not unique.

Proof:

Define 8 t 2 R, xt = x

? + t⌘.

If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.

Let t0 the smallest |t| s.t. sign(xt) 6= sign(x?).

8 |t| < t0, xt is solution.

�xt = �x? and same sign:

Full Rank Condition

Lemma:There exists x

?such that

t0t

xt

0

ker(�I) = {0}.

Page 27: Signal Processing Course : Theory for Sparse Recovery

�! if ker(�I) 6= {0}, x?not unique.

Proof:

Define 8 t 2 R, xt = x

? + t⌘.

If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.

Let t0 the smallest |t| s.t. sign(xt) 6= sign(x?).

By continuity, xt0 solution.

and | supp(xt0)| < | supp(x?)|.

8 |t| < t0, xt is solution.

�xt = �x? and same sign:

Full Rank Condition

Lemma:There exists x

?such that

t0t

xt

0

ker(�I) = {0}.

Page 28: Signal Processing Course : Theory for Sparse Recovery

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

I = supp(s)

Page 29: Signal Processing Course : Theory for Sparse Recovery

�! ok, by continuity.

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Case 1: dsj(y,�) < �

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

I = supp(s)

Page 30: Signal Processing Course : Theory for Sparse Recovery

�! ok, by continuity.

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �

then dsj(y, ¯�) = ¯� �! ok.

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

I = supp(s)

Page 31: Signal Processing Course : Theory for Sparse Recovery

�! ok, by continuity.

'j /2 Im(�I)

�! exclude this case.

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �

then dsj(y, ¯�) = ¯� �! ok.

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

Case 3: dsj(y,�) = � and

I = supp(s)

Page 32: Signal Processing Course : Theory for Sparse Recovery

�! ok, by continuity.

'j /2 Im(�I)

�! exclude this case.

Exclude hyperplanes:

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �

H =[

{Hs,j \ 'j /2 Im(�I)}Hs,j =

�(y,�) \ dsj(y, �) = �

then dsj(y, ¯�) = ¯� �! ok.

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

Case 3: dsj(y,�) = � and

I = supp(s)

Page 33: Signal Processing Course : Theory for Sparse Recovery

x

?=0

H;,j

�! ok, by continuity.

'j /2 Im(�I)

�! exclude this case.

Exclude hyperplanes:

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �

H =[

{Hs,j \ 'j /2 Im(�I)}Hs,j =

�(y,�) \ dsj(y, �) = �

then dsj(y, ¯�) = ¯� �! ok.

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

Case 3: dsj(y,�) = � and

I = supp(s)

Page 34: Signal Processing Course : Theory for Sparse Recovery

x

?=0

H;,j

�! ok, by continuity.

'j /2 Im(�I)

�! exclude this case.

Exclude hyperplanes:

HI,j

d

sj(y, �) = |h'j , y � �I x�(y)i| 6 �

Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �

H =[

{Hs,j \ 'j /2 Im(�I)}Hs,j =

�(y,�) \ dsj(y, �) = �

then dsj(y, ¯�) = ¯� �! ok.

Proofx�(y)I = �+

I y � �(��I�I)�1sI

To show: 8 j /2 I,

Case 3: dsj(y,�) = � and

I = supp(s)

Page 35: Signal Processing Course : Theory for Sparse Recovery

Local parameterization:

y �� x�

� �� x�

Under uniqueness assumption:

are piecewise a�ne functions.

Local Affine Maps

x�(y)I = �+I y � �(��I�I)�1sI

x1

x2

�0 = 0 �k

x�k = 0

x�0

(BP sol.)

breaking points

change of support of x���

Page 36: Signal Processing Course : Theory for Sparse Recovery

Corrolary: µ(y) = �x1 = �x2 is uniquely defined.

Projector

Proposition: If x1 and x2 minimize E�,

E�(x) = 12 ||�x� y||2 + �||x||1

then �x1 = �x2.

Page 37: Signal Processing Course : Theory for Sparse Recovery

Corrolary: µ(y) = �x1 = �x2 is uniquely defined.

x3 = (x1 + x2)/2 is solution and if �x1 6= �x2,

Projector

Proposition: If x1 and x2 minimize E�,

E�(x) = 12 ||�x� y||2 + �||x||1

then �x1 = �x2.

Proof:

2||x3||1 6 ||x1||1 + ||x2||12||�x3 � y||2 < ||�x1 � y||2 + ||�x2 � y||2

E�(x3)< E�(x1) = E�(x2) =) contradiction.

Page 38: Signal Processing Course : Theory for Sparse Recovery

For (y,�) close to (y,�) /2 H:

µ(y) = PI(y)� �dI= �I�

+I = �+,⇤

I sIPI : orthogonal projector on {�x \ supp(x) = I}.

Corrolary: µ(y) = �x1 = �x2 is uniquely defined.

x3 = (x1 + x2)/2 is solution and if �x1 6= �x2,

Projector

Proposition: If x1 and x2 minimize E�,

E�(x) = 12 ||�x� y||2 + �||x||1

then �x1 = �x2.

Proof:

2||x3||1 6 ||x1||1 + ||x2||12||�x3 � y||2 < ||�x1 � y||2 + ||�x2 � y||2

E�(x3)< E�(x1) = E�(x2) =) contradiction.

Page 39: Signal Processing Course : Theory for Sparse Recovery

Overview

• Polytope Noiseless Recovery

• Local Behavior of Sparse Regularization

• Robustness to Small Noise

• Robustness to Bounded Noise

• Compressed Sensing RIP Theory

Page 40: Signal Processing Course : Theory for Sparse Recovery

Uniqueness Sufficient Condition

E�(x) = 12 ||�x� y||2 + �||x||1

Page 41: Signal Processing Course : Theory for Sparse Recovery

Uniqueness Sufficient Condition

Theorem: If �I has full rank and ||��Ic(�x� � y)||� < �

E�(x) = 12 ||�x� y||2 + �||x||1

then x

?is the unique minimizer of E�.

Page 42: Signal Processing Course : Theory for Sparse Recovery

||�Ic(�x? � y)||1 = ||�Ic(�x? � y)||1 < �

=) supp(x?) ⇢ I

x

?I � x

?I 2 ker(�I) = {0}.

Let x? be a minimizer.

Then �x? = �x? =)

=) x

? = x

?

Uniqueness Sufficient Condition

Theorem: If �I has full rank and ||��Ic(�x� � y)||� < �

E�(x) = 12 ||�x� y||2 + �||x||1

Proof:

then x

?is the unique minimizer of E�.

Page 43: Signal Processing Course : Theory for Sparse Recovery

F(s) = ||�IsI ||� where ⇥I = ��Ic�+,�

I

Identifiability crition: [Fuchs]

(�I is assumed to have full rank)

For s ⇥ {�1, 0,+1}N , let I = supp(s)

�+I = (��I�I)�1��I satisfies �+

I �I = IdI

Robustness to Small Noise

Page 44: Signal Processing Course : Theory for Sparse Recovery

F(s) = ||�IsI ||� where ⇥I = ��Ic�+,�

I

Identifiability crition: [Fuchs]

(�I is assumed to have full rank)

�⇥ If ||w|| small enough, ||x� � x0|| = O(||w||).

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1,

x0 + �+I w � �(��I�I)�1 sign(x0,I)

T = mini�I

|x0,i|

For s ⇥ {�1, 0,+1}N , let I = supp(s)

�+I = (��I�I)�1��I satisfies �+

I �I = IdI

Robustness to Small Noise

Theorem:

Page 45: Signal Processing Course : Theory for Sparse Recovery

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j⇥|

where dI defined by:� i � I, �dI , �i� = si

dI = �I(��I�I)�1sI

Geometric Interpretation

�j

�idI = �+,�

I sI

Page 46: Signal Processing Course : Theory for Sparse Recovery

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j⇥|

where dI defined by:� i � I, �dI , �i� = si

Condition F (s) < 1: no vector �j inside the cap Cs.

dI

Cs

dI = �I(��I�I)�1sI

Geometric Interpretation

�j

�i

�i

�j

|�dI , �⇥| < 1

dI = �+,�I sI

Page 47: Signal Processing Course : Theory for Sparse Recovery

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j⇥|

where dI defined by:� i � I, �dI , �i� = si

Condition F (s) < 1: no vector �j inside the cap Cs.

dI

Cs

dI

�i

�j

�k

dI = �I(��I�I)�1sI

Geometric Interpretation

�j

�i

�i

�j

|�dI , �⇥| < 1

|�dI ,

�⇥|<

1

dI = �+,�I sI

Page 48: Signal Processing Course : Theory for Sparse Recovery

Local candidate: x� = x(sign(x�))

x(s)I = �+I y � �(��I�I)�1sI , I = supp(s)where

implicit equation

�⇥ To prove: x = x(sign(x0)) is the unique solution of P�(y).

Sketch of Proof

Page 49: Signal Processing Course : Theory for Sparse Recovery

Local candidate:

Sign consistency:

x� = x(sign(x�))

x(s)I = �+I y � �(��I�I)�1sI , I = supp(s)where

implicit equation

sign(x) = sign(x0) (C1)

y = �x0 + w =� x = x0 + �+I w � �(��I�I)�1sI

||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T =� (C1)

�⇥ To prove: x = x(sign(x0)) is the unique solution of P�(y).

Sketch of Proof

Page 50: Signal Processing Course : Theory for Sparse Recovery

Local candidate:

Sign consistency:

First order conditions:

x� = x(sign(x�))

x(s)I = �+I y � �(��I�I)�1sI , I = supp(s)where

implicit equation

sign(x) = sign(x0) (C1)

(C2)

y = �x0 + w =� x = x0 + �+I w � �(��I�I)�1sI

||��Ic(�x� y)||� < �

||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T

||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0

=� (C1)

=� (C2)

�⇥ To prove: x = x(sign(x0)) is the unique solution of P�(y).

Sketch of Proof

Page 51: Signal Processing Course : Theory for Sparse Recovery

=� x isthe solution

� �

Sketch of Proof (cont)

||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T

||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0

Page 52: Signal Processing Course : Theory for Sparse Recovery

=� x isthe solution

� �

||w||

�||w||+⇥⇤

=T

For ||w||/T < ⇥max, one can choose � � ||w||/T

such that x is the solution of P�(y).

T�max

�||w||�

⇥⇤=

0

Sketch of Proof (cont)

||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T

||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0

Page 53: Signal Processing Course : Theory for Sparse Recovery

=� x isthe solution

� �

||w||

�||w||+⇥⇤

=T

For ||w||/T < ⇥max, one can choose � � ||w||/T

such that x is the solution of P�(y).

T�max

�||w||�

⇥⇤=

0

= O(||w||)||x� x0|| � ||�+

I w|| + � ||(��I�I)�1||�,2

=⇥ ||x� x0|| = O(||w||)

Sketch of Proof (cont)

||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T

||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0

Page 54: Signal Processing Course : Theory for Sparse Recovery

Overview

• Polytope Noiseless Recovery

• Local Behavior of Sparse Regularization

• Robustness to Small Noise

• Robustness to Bounded Noise

• Compressed Sensing RIP Theory

Page 55: Signal Processing Course : Theory for Sparse Recovery

Exact Recovery Criterion (ERC): [Tropp]

ERC(I) = ||�I ||�,�

Relation with F criterion: ERC(I) = maxs,supp(s)�I

F(s)

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

Robustness to Bounded Noise

where ⇥I = ��Ic�+,�

I

Page 56: Signal Processing Course : Theory for Sparse Recovery

Exact Recovery Criterion (ERC): [Tropp]

ERC(I) = ||�I ||�,�

Relation with F criterion: ERC(I) = maxs,supp(s)�I

F(s)

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

Robustness to Bounded Noise

where ⇥I = ��Ic�+,�

I

Theorem: If ERC(supp(x0)) < 1 and � � ||w||, then

||x0 � x�|| = O(||w||)x� is unique, satisfies supp(x�) � supp(x0), and

Page 57: Signal Processing Course : Theory for Sparse Recovery

Restricted recovery:x ⇥ argmin

supp(x)�I

12

||�x� y||2 + �||x||1

�⇥ To prove: x is the unique solution of P�(y).

Sketch of Proof

Page 58: Signal Processing Course : Theory for Sparse Recovery

Restricted recovery:x ⇥ argmin

supp(x)�I

12

||�x� y||2 + �||x||1

Implicit equation: xI = �+I y � �(��I�I)�1sI

Important: s = sign(x) is not equal to sign(x�).

�⇥ To prove: x is the unique solution of P�(y).

Sketch of Proof

Page 59: Signal Processing Course : Theory for Sparse Recovery

Restricted recovery:x ⇥ argmin

supp(x)�I

12

||�x� y||2 + �||x||1

Implicit equation: xI = �+I y � �(��I�I)�1sI

Important: s = sign(x) is not equal to sign(x�).

�⇥ To prove: x is the unique solution of P�(y).

Sketch of Proof

First order conditions: (C2)||��Ic(�x� y)||� < �

||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0 =� (C2)

Page 60: Signal Processing Course : Theory for Sparse Recovery

Restricted recovery:x ⇥ argmin

supp(x)�I

12

||�x� y||2 + �||x||1

Implicit equation: xI = �+I y � �(��I�I)�1sI

Important: s = sign(x) is not equal to sign(x�).

ERC(I) < 1 =� F (s) < 1Since s is arbitrary:

Hence, choosing � � ||w|| implies (C2).

�⇥ To prove: x is the unique solution of P�(y).

Sketch of Proof

First order conditions: (C2)||��Ic(�x� y)||� < �

||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0 =� (C2)

Page 61: Signal Processing Course : Theory for Sparse Recovery

�(A,B) = maxj

i�I

|�ai, bj⇥|

�(A) = maxj

i �=j

|�ai, aj⇥|

w-ERC(I) =

��

�(�I ,�Ic)1� �(�I)

if �(�I) < 1

+� otherwise.

Weak Exact Recovery Criterion: [Gribonval,Dossal]

(for I = supp(s))

For A = (ai)i, B = (bi)i, where ai, bi � RP ,

F(s) � ERC(I) � w-ERC(I)

Weak ERC

Denoting � = (�i)N�1i=0 where �i � RP

Theorem:

Page 62: Signal Processing Course : Theory for Sparse Recovery

ERC(I) = maxj /�I

||�+I �j ||1 � ||(��I�I)�1||1,1max

j /�I||��I�j ||1

maxj /�I

||��I⇥j ||1 = max

j /�I

i�m

|�⇥i, ⇥j⇥| = �(�I ,�Ic)

Proof

(for I = supp(s))F(s) � ERC(I) � w-ERC(I)Theorem:

Page 63: Signal Processing Course : Theory for Sparse Recovery

ERC(I) = maxj /�I

||�+I �j ||1 � ||(��I�I)�1||1,1max

j /�I||��I�j ||1

maxj /�I

||��I⇥j ||1 = max

j /�I

i�m

|�⇥i, ⇥j⇥| = �(�I ,�Ic)

One has ��I�I = Id�H, if ||H||1,1 < 1,

(��I�I)�1 = (Id�H)�1 =�

k�0

Hk

||(��I�I)�1||1,1 ��

k�0

||H||k1,1 =1

1� ||H||1,1

||H||1,1 = maxi�I

j �=i

|�⇥i, ⇥j⇥| = �(�I)

Proof

(for I = supp(s))F(s) � ERC(I) � w-ERC(I)Theorem:

Page 64: Signal Processing Course : Theory for Sparse Recovery

P = 200, N = 1000

F < 1ERC < 1 x� = x0

w-ERC < 1

Example: Random Matrix

0 10 20 30 40 50

0

0.2

0.4

0.6

0.8

1

Page 65: Signal Processing Course : Theory for Sparse Recovery

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.

F (s)ERC(I)

w-ERC(I)

� reduces resolution.

Example: Deconvolution

�x0

x0�

Page 66: Signal Processing Course : Theory for Sparse Recovery

Coherence Boundsµ(�) = max

i �=j|��i, �j⇥|Mutual coherence:

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Page 67: Signal Processing Course : Theory for Sparse Recovery

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

�If

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Page 68: Signal Processing Course : Theory for Sparse Recovery

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

�If

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

For Gaussian matrices:

For convolution matrices: useless criterion.

µ(�) ��

log(PN)/P

One has: Optimistic setting:||x0||0 � O(

�P )

µ(�) ��

N � P

P (N � 1)

Page 69: Signal Processing Course : Theory for Sparse Recovery

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

Coherence - Examples

Page 70: Signal Processing Course : Theory for Sparse Recovery

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

minx�R2N

12

||y � �x||2 + �||x||1

minx1,x2�RN

12

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

= +

Coherence - Examples

Page 71: Signal Processing Course : Theory for Sparse Recovery

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

µ(�) =1�N

=� separates up to�

N/2 Diracs + sines.

minx�R2N

12

||y � �x||2 + �||x||1

minx1,x2�RN

12

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

= +

Coherence - Examples

Page 72: Signal Processing Course : Theory for Sparse Recovery

Overview

• Polytope Noiseless Recovery

• Local Behavior of Sparse Regularization

• Robustness to Small Noise

• Robustness to Bounded Noise

• Compressed Sensing RIP Theory

Page 73: Signal Processing Course : Theory for Sparse Recovery

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

⇥ argminx

12

||�x� y||2 + �||x||1 ��� �

CS with RIP

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Page 74: Signal Processing Course : Theory for Sparse Recovery

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

⇥ argminx

12

||�x� y||2 + �||x||1 ��� �

CS with RIP

[Candes 2009]

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Theorem: If �2k ��

2� 1, then

where xk is the best k-term approximation of x0.

||x0 � x�|| � C0⇥k

||x0 � xk||1 + C1�

Page 75: Signal Processing Course : Theory for Sparse Recovery

||hT c0||1 � ||hT0 ||1 + 2||xT c

0||1Optimality conditions:

C0 =2�

1� �C1 =

1� ⇥� = 2

�1 + �2k

1� �2k

� =�

2�2k

1� �2k

Explicit constants:

Reference:

Elements of Proof

{0, . . . , N � 1} = T0 ⇥ T1 ⇥ . . . ⇥ Tm

k elements

of x0 of hT c0

largesth = x� � x0

xk = xT0largest

||x0 � x�|| � C0⇥s

||x0 � xk||1 + C1�

E. J. Candes, CRAS, 2006

Page 76: Signal Processing Course : Theory for Sparse Recovery

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

Singular Values Distributions

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

P = 200, k = 10

f�(�)

�k = 30

Page 77: Signal Processing Course : Theory for Sparse Recovery

Link with coherence:

�k � (k � 1)µ(�)

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

Page 78: Signal Processing Course : Theory for Sparse Recovery

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

Page 79: Signal Processing Course : Theory for Sparse Recovery

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

Stronger result:

�2 = µ(�)

RIP for Gaussian Matrices

k � C

log(N/P )PTheorem: If

then �2k ��

2� 1 with high probability.

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

Page 80: Signal Processing Course : Theory for Sparse Recovery

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

smallest / largest eigenvalues of A�A

Numerics with RIP

Page 81: Signal Processing Course : Theory for Sparse Recovery

�2� 1

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

Upper/lower RIC:

�ik = max

|I|=k�i(�I)

�k = min(�1k, �2

k)

k

�2k

�2k

Monte-Carlo estimation:�k � �k

smallest / largest eigenvalues of A�A

Numerics with RIP

Page 82: Signal Processing Course : Theory for Sparse Recovery

Local behavior:

� ! x

?polygonal.

y ! x

? piecewise a�ne.

Conclusion

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

Page 83: Signal Processing Course : Theory for Sparse Recovery

Local behavior:

� ! x

?polygonal.

y ! x

? piecewise a�ne.

Noiseless recovery:

() geometry of polytopes.

Conclusion

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

x0

Page 84: Signal Processing Course : Theory for Sparse Recovery

Local behavior:

� ! x

?polygonal.

y ! x

? piecewise a�ne.

Noiseless recovery:

() geometry of polytopes.

Small noise:

�! sign stability.

�! support inclusion.

Bounded noise:

RIP-based:

�! no support stability, L1bounds.

Conclusion

10 20 30 40 50 60−1

−0.5

0

0.5

s=3

10 20 30 40 50 60

−0.5

0

0.5

s=6

20 40 60 80 100

−0.5

0

0.5

1

s=13

20 40 60 80 100 120 140

−1.5

−1

−0.5

0

0.5

1

1.5

s=25

x0