sparsity and compressed sensing

168
Sparsity and Compressed Sensing Gabriel Peyré www.numerical-tours.com

Upload: gabriel-peyre

Post on 08-Jun-2015

838 views

Category:

Documents


4 download

DESCRIPTION

Slides of the lectures given at the summer school "Biomedical Image Analysis Summer School : Modalities, Methodologies & Clinical Research", Centrale Paris, Paris, July 9-13, 2012

TRANSCRIPT

Page 1: Sparsity and Compressed Sensing

Sparsity andCompressed Sensing

Gabriel Peyré

www.numerical-tours.com

Page 2: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 3: Sparsity and Compressed Sensing

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Inverse Problems

y = K f0 + w � RP

Page 4: Sparsity and Compressed Sensing

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Denoising: K = IdQ, P = Q.

Inverse Problems

y = K f0 + w � RP

Page 5: Sparsity and Compressed Sensing

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP

Denoising: K = IdQ, P = Q.

Inpainting: set � of missing pixels, P = Q� |�|.

Inverse Problems

y = K f0 + w � RP

Page 6: Sparsity and Compressed Sensing

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP

Denoising: K = IdQ, P = Q.

Inpainting: set � of missing pixels, P = Q� |�|.

Super-resolution: Kf = (f � k) �� , P = Q/� .

Inverse Problems

K

y = K f0 + w � RP

Page 7: Sparsity and Compressed Sensing

Kf = (p�k)1�k�K

Inverse Problem in Medical Imaging

Page 8: Sparsity and Compressed Sensing

Magnetic resonance imaging (MRI):

Kf = (p�k)1�k�K

Kf = (f(�))���

Inverse Problem in Medical Imaging

f

Page 9: Sparsity and Compressed Sensing

Magnetic resonance imaging (MRI):

Other examples: MEG, EEG, . . .

Kf = (p�k)1�k�K

Kf = (f(�))���

Inverse Problem in Medical Imaging

f

Page 10: Sparsity and Compressed Sensing

Noisy measurements: y = Kf0 + w.

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Page 11: Sparsity and Compressed Sensing

Noisy measurements: y = Kf0 + w.

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Data fidelity Regularity

Page 12: Sparsity and Compressed Sensing

Noisy measurements: y = Kf0 + w.

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Data fidelity Regularity

Page 13: Sparsity and Compressed Sensing

Noisy measurements: y = Kf0 + w.

No noise: �� 0+, minimize

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

f� � argminf�RQ,Kf=y

J(f)

Inverse Problem Regularization

Data fidelity Regularity

Page 14: Sparsity and Compressed Sensing

J(f) =�

||�f(x)||2dx

Smooth and Cartoon Priors

�|�f |2

Page 15: Sparsity and Compressed Sensing

J(f) =�

||�f(x)||2dx

J(f) =�

||�f(x)||dx

J(f) =�

Rlength(Ct)dt

Smooth and Cartoon Priors

�|�f |2 �

|�f |

Page 16: Sparsity and Compressed Sensing

Inpainting Example

Input y = Kf0 + w Sobolev Total variation

Page 17: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 18: Sparsity and Compressed Sensing

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

Page 19: Sparsity and Compressed Sensing

�m = ei�·, m�

frequencyFourier:

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

Page 20: Sparsity and Compressed Sensing

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

scale

orientation

position

�m = ei�·, m�

frequencyFourier:

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

� = 2� = 1

Page 21: Sparsity and Compressed Sensing

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

scale

orientation

position

�m = ei�·, m�

frequencyFourier:

DCT, Curvelets, bandlets, . . .

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

� = 2� = 1

Page 22: Sparsity and Compressed Sensing

Synthesis: f =�

m xm�m = �x.

xf

Image f = �xCoe�cients x

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

scale

orientation

position

�m = ei�·, m�

frequencyFourier:

DCT, Curvelets, bandlets, . . .

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

� =�

� = 2� = 1

Page 23: Sparsity and Compressed Sensing

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse Priors

Image f0

Coe�cients x

Page 24: Sparsity and Compressed Sensing

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x whereargminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Page 25: Sparsity and Compressed Sensing

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x where

Orthogonal �: ��� = ��� = IdN

xm =�

�f0, �m� if |�f0, �m�| > T,0 otherwise.

��

f = �� � ST � �(f0)ST

argminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Page 26: Sparsity and Compressed Sensing

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x where

Orthogonal �: ��� = ��� = IdN

xm =�

�f0, �m� if |�f0, �m�| > T,0 otherwise.

��

f = �� � ST � �(f0)ST

Non-orthogonal �:�� NP-hard.

argminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Page 27: Sparsity and Compressed Sensing

Image with 2 pixels:

q = 0

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

x2

Convex Relaxation: L1 Prior

x1

Page 28: Sparsity and Compressed Sensing

Image with 2 pixels:

q = 0 q = 1 q = 2q = 3/2q = 1/2

Jq(x) =�

m

|xm|q

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

x2

Convex Relaxation: L1 Prior

�q priors: (convex for q � 1)

x1

Page 29: Sparsity and Compressed Sensing

Image with 2 pixels:

q = 0 q = 1 q = 2q = 3/2q = 1/2

Jq(x) =�

m

|xm|q

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

x2

J1(x) =�

m

|xm|

Convex Relaxation: L1 Prior

Sparse �1 prior:

�q priors: (convex for q � 1)

x1

Page 30: Sparsity and Compressed Sensing

L1 Regularization

coe�cientsx0 � RN

Page 31: Sparsity and Compressed Sensing

L1 Regularization

coe�cients image�

x0 � RN f0 = �x0 � RQ

Page 32: Sparsity and Compressed Sensing

L1 Regularization

observations

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Page 33: Sparsity and Compressed Sensing

L1 Regularization

observations

� = K �⇥ ⇥ RP�N

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Page 34: Sparsity and Compressed Sensing

Fidelity Regularization

minx�RN

12

||y � �x||2 + �||x||1

L1 Regularization

Sparse recovery: f� = �x� where x� solves

observations

� = K �⇥ ⇥ RP�N

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Page 35: Sparsity and Compressed Sensing

x� � argmin�x=y

m

|xm|

x�

�x = y

Noiseless Sparse Regularization

Noiseless measurements: y = �x0

Page 36: Sparsity and Compressed Sensing

x� � argmin�x=y

m

|xm|

x�

�x = y

x� � argmin�x=y

m

|xm|2

Noiseless Sparse Regularization

x�

�x = y

Noiseless measurements: y = �x0

Page 37: Sparsity and Compressed Sensing

x� � argmin�x=y

m

|xm|

x�

�x = y

x� � argmin�x=y

m

|xm|2

Noiseless Sparse Regularization

Convex linear program.Interior points, cf. [Chen, Donoho, Saunders] “basis pursuit”.

Douglas-Rachford splitting, see [Combettes, Pesquet].

x�

�x = y

Noiseless measurements: y = �x0

Page 38: Sparsity and Compressed Sensing

RegularizationData fidelity

y = �x0 + wNoisy measurements:

x� � argminx�RQ

12

||y � �x||2 + � ||x||1

Noisy Sparse Regularization

Page 39: Sparsity and Compressed Sensing

�� �RegularizationData fidelityEquivalence

||�x =y|| �

y = �x0 + wNoisy measurements:

x� � argminx�RQ

12

||y � �x||2 + � ||x||1

x� � argmin||�x�y||��

||x||1

Noisy Sparse Regularization

x�

Page 40: Sparsity and Compressed Sensing

Iterative soft thresholdingForward-backward splitting

Algorithms:

�� �RegularizationData fidelityEquivalence

||�x =y|| �

y = �x0 + wNoisy measurements:

x� � argminx�RQ

12

||y � �x||2 + � ||x||1

x� � argmin||�x�y||��

||x||1

Noisy Sparse Regularization

Nesterov multi-steps schemes.

see [Daubechies et al], [Pesquet et al], etc

��

x�

Page 41: Sparsity and Compressed Sensing

Image De-blurring

Original f0 y = h � f0 + w

Page 42: Sparsity and Compressed Sensing

f� = argminf�RN

||f ⇥ h� y||2 + �||⇥f ||2

f�(⇥) =h(⇥)

|h(⇥)|2 + �|⇥|2y(⇥)

Sobolev regularization:

Image De-blurring

Original f0 y = h � f0 + w SobolevSNR=22.7dB

Page 43: Sparsity and Compressed Sensing

f� = argminf�RN

||f ⇥ h� y||2 + �||⇥f ||2

f�(⇥) =h(⇥)

|h(⇥)|2 + �|⇥|2y(⇥)

Sobolev regularization:

� = translation invariant wavelets.

x� � argminx

12

||h � (�x)� y||2 + �||x||1f� = �x� where

Sparsity

Image De-blurring

Original f0 y = h � f0 + w

Sparsity regularization:

SNR=24.7dBSobolev

SNR=22.7dB

Page 44: Sparsity and Compressed Sensing

K

y = Kf0 + wMeasures:

Inpainting Problem

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

Page 45: Sparsity and Compressed Sensing

Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.

Page 46: Sparsity and Compressed Sensing

Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.

Page 47: Sparsity and Compressed Sensing

Union dictionary:

(x�1, x

�2) � argmin

x=(x1,x2)�RN

12

||f ��x||2 + �||x||1

� = [�1,�2] � RQ�(N1+N2)

Image Separation

Recovered component: f�i = �ix�

i .

Model: f = f1 + f2 + w, (f1, f2) components, w noise.

Page 48: Sparsity and Compressed Sensing

Examples of Decompositions

Page 49: Sparsity and Compressed Sensing

Cartoon+Texture Separation

Page 50: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 51: Sparsity and Compressed Sensing

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

minx�H

G(x)Problem:

Page 52: Sparsity and Compressed Sensing

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

Page 53: Sparsity and Compressed Sensing

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}Sub-di�erential:

G(x) = |x|

�G(0) = [�1, 1]

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

Page 54: Sparsity and Compressed Sensing

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}

If F is C1, �F (x) = {�F (x)}

Sub-di�erential:

Smooth functions: G(x) = |x|

�G(0) = [�1, 1]

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

Page 55: Sparsity and Compressed Sensing

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}

If F is C1, �F (x) = {�F (x)}

Sub-di�erential:

Smooth functions: G(x) = |x|

�G(0) = [�1, 1]First-order conditions:

�� 0 � �G(x�)x� � argminx�H

G(x)

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

Page 56: Sparsity and Compressed Sensing

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

L1 Regularization: First Order Conditions

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

Page 57: Sparsity and Compressed Sensing

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}

Support of the solution:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

L1 Regularization: First Order Conditions

i

x�i

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

Page 58: Sparsity and Compressed Sensing

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}

Support of the solution:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

Restrictions:xI = (xi)i�I � R|I| �I = (�i)i�I � RP�|I|

L1 Regularization: First Order Conditions

i

x�i

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

Page 59: Sparsity and Compressed Sensing

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

P�(y)

L1 Regularization: First Order Conditions

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

Page 60: Sparsity and Compressed Sensing

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

=� sIc =1�

��Ic(y � �x�)

P�(y)

L1 Regularization: First Order Conditions

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

��i, y � �x��

��

� i

Page 61: Sparsity and Compressed Sensing

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

=� sIc =1�

��Ic(y � �x�)

P�(y)

L1 Regularization: First Order Conditions

Theorem:

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

��i, y � �x��

��

� i

Page 62: Sparsity and Compressed Sensing

Theorem:

First order condition:

then x� is the unique solution of P�(y)

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

If �I has full rank and

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

=� sIc =1�

��Ic(y � �x�)

P�(y)

L1 Regularization: First Order Conditions

||��Ic(�x� � y)||� < �

Theorem:

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

��i, y � �x��

��

� i

Page 63: Sparsity and Compressed Sensing

(implicit equation)

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

Local Behavior of the Solution

=�

��(�x� � y) + �s = 0First order condition:

x� � argminx�RN

12

||�x� y||2 + �||x||1

Page 64: Sparsity and Compressed Sensing

(implicit equation)

Intuition: for small w.(unknown) (known)

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

sI = sign(x�I) = sign(x0,I) = s0,I

Local Behavior of the Solution

=�

��(�x� � y) + �s = 0First order condition:

x� � argminx�RN

12

||�x� y||2 + �||x||1

Page 65: Sparsity and Compressed Sensing

(implicit equation)

Intuition: for small w.

To prove:

(unknown) (known)

is the unique solution.

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

xI = x0,I + �+I w � �(��I�I)�1s0,I

sI = sign(x�I) = sign(x0,I) = s0,I

Local Behavior of the Solution

=�

��(�x� � y) + �s = 0First order condition:

x� � argminx�RN

12

||�x� y||2 + �||x||1

Page 66: Sparsity and Compressed Sensing

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

Local Behavior of the Solution

Page 67: Sparsity and Compressed Sensing

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

Local Behavior of the Solution

To prove: ||�Ic(�I xI � y)||� < 1

Page 68: Sparsity and Compressed Sensing

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

�I = ��Ic(�I�+

I � Id) �I = ��Ic�+,�

I

1�

��Ic(�I xI � y) = �I

�w

�� �I(s0,I)

Local Behavior of the Solution

To prove: ||�Ic(�I xI � y)||� < 1

Page 69: Sparsity and Compressed Sensing

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

can be madesmall when w � 0

|| · ||� mustbe < 1

�I = ��Ic(�I�+

I � Id) �I = ��Ic�+,�

I

1�

��Ic(�I xI � y) = �I

�w

�� �I(s0,I)

Local Behavior of the Solution

To prove: ||�Ic(�I xI � y)||� < 1

Page 70: Sparsity and Compressed Sensing

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

Page 71: Sparsity and Compressed Sensing

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

Page 72: Sparsity and Compressed Sensing

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

When w = 0, F (sign(x0) < 1 =� x� = x0.

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

Page 73: Sparsity and Compressed Sensing

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

When w = 0, F (sign(x0) < 1 =� x� = x0.

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

Theorem: [Grassmair et al. 2010] If F (sign(x0)) < 1

if � � ||w||, ||x� � x0|| = O(||w||)

Page 74: Sparsity and Compressed Sensing

where dI defined by:� i � I, �dI , �i� = si

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�j

�idI = �+,�

I sI

Page 75: Sparsity and Compressed Sensing

where dI defined by:� i � I, �dI , �i� = si

Condition F (s) < 1: no vector �j inside the cap Cs.

dI

Cs

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�j

�i

�i

�j

|�dI , �⇥| < 1

dI = �+,�I sI

Page 76: Sparsity and Compressed Sensing

where dI defined by:� i � I, �dI , �i� = si

Condition F (s) < 1: no vector �j inside the cap Cs.

dI

Cs

dI

�i

�j

�k

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�j

�i

�i

�j

|�dI , �⇥| < 1

|�dI ,

�⇥|<

1

dI = �+,�I sI

Page 77: Sparsity and Compressed Sensing

Exact Recovery Criterion (ERC): [Tropp]

Relation with F criterion: ERC(I) = maxs,supp(s)�I

F(s)

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

ERC(I) = ||�I ||�,� where �I = ��Ic�+,�

I

Robustness to Bounded Noise

Page 78: Sparsity and Compressed Sensing

Exact Recovery Criterion (ERC): [Tropp]

Relation with F criterion: ERC(I) = maxs,supp(s)�I

F(s)

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

ERC(I) = ||�I ||�,� where �I = ��Ic�+,�

I

Robustness to Bounded Noise

Theorem: If ERC(supp(x0)) < 1 and � � ||w||, then

||x0 � x�|| = O(||w||)x� is unique, satisfies supp(x�) � supp(x0), and

Page 79: Sparsity and Compressed Sensing

P = 200, N = 1000

F < 1ERC < 1 x� = x0

w-ERC < 1

Example: Random Matrix

0 10 20 30 40 50

0

0.2

0.4

0.6

0.8

1

Page 80: Sparsity and Compressed Sensing

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.

F (s)ERC(I)

w-ERC(I)

� reduces resolution.

Example: Deconvolution

�x0

x0�

Page 81: Sparsity and Compressed Sensing

Coherence Boundsµ(�) = max

i �=j|��i, �j⇥|Mutual coherence:

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Page 82: Sparsity and Compressed Sensing

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

�If

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Page 83: Sparsity and Compressed Sensing

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

�If

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

For Gaussian matrices:

For convolution matrices: useless criterion.

µ(�) ��

log(PN)/P

One has: Optimistic setting:||x0||0 � O(

�P )

µ(�) ��

N � P

P (N � 1)

Page 84: Sparsity and Compressed Sensing

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

Spikes and Sinusoids Separation

Page 85: Sparsity and Compressed Sensing

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

minx�R2N

12

||y � �x||2 + �||x||1

minx1,x2�RN

12

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

= +

Spikes and Sinusoids Separation

Page 86: Sparsity and Compressed Sensing

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

µ(�) =1�N

=� separates up to�

N/2 Diracs + sines.

minx�R2N

12

||y � �x||2 + �||x||1

minx1,x2�RN

12

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

= +

Spikes and Sinusoids Separation

Page 87: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 88: Sparsity and Compressed Sensing

Data aquisition:

Sensors

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

�0

Page 89: Sparsity and Compressed Sensing

Data aquisition:

Sensors

where h(t) =sin(�t)

�t

f(t) =�

i

f [i]h(Nt� i)

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

�0

Page 90: Sparsity and Compressed Sensing

Data aquisition:

Sensors

where h(t) =sin(�t)

�t

f(t) =�

i

f [i]h(Nt� i)

�� Natural images are not smooth.

�� But can be compressed e�ciently.

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

�0

Page 91: Sparsity and Compressed Sensing

Single Pixel Camera (Rice)

y[i] = �f0, �i⇥

Page 92: Sparsity and Compressed Sensing

Single Pixel Camera (Rice)

y[i] = �f0, �i⇥

f0, N = 2562 f�, P/N = 0.16 f�, P/N = 0.02

Page 93: Sparsity and Compressed Sensing

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardwareK

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

Page 94: Sparsity and Compressed Sensing

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

K

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

,

,

Page 95: Sparsity and Compressed Sensing

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

fOperator K

K

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

,

,

Page 96: Sparsity and Compressed Sensing

f0 � RN sparse in ortho-basis �

Sparse CS Recovery

���

x0 � RN

f0 � RN

Page 97: Sparsity and Compressed Sensing

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

Sparse CS Recovery

���

x0 � RN

f0 � RN

Page 98: Sparsity and Compressed Sensing

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse CS Recovery

���

x0 � RN

f0 � RN

Page 99: Sparsity and Compressed Sensing

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse recovery:min

||�x�y||�||w||||x||1 min

x

12

||�x� y||2 + �||x||1||w||�� �

Sparse CS Recovery

���

x0 � RN

f0 � RN

Page 100: Sparsity and Compressed Sensing

� = translation invariantwavelet frame

Original f0

CS Simulation Example

Page 101: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 102: Sparsity and Compressed Sensing

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Page 103: Sparsity and Compressed Sensing

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

[Candes 2009]

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Theorem: If �2k ��

2� 1, then

where xk is the best k-term approximation of x0.

||x0 � x�|| � C0⇥k

||x0 � xk||1 + C1�

Page 104: Sparsity and Compressed Sensing

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

Singular Values Distributions

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

P = 200, k = 10

f�(�)

�k = 30

Page 105: Sparsity and Compressed Sensing

Link with coherence:

�k � (k � 1)µ(�)

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

Page 106: Sparsity and Compressed Sensing

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

Page 107: Sparsity and Compressed Sensing

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

Stronger result:

�2 = µ(�)

RIP for Gaussian Matrices

k � C

log(N/P )PTheorem: If

then �2k ��

2� 1 with high probability.

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

Page 108: Sparsity and Compressed Sensing

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

smallest / largest eigenvalues of A�A

Numerics with RIP

Page 109: Sparsity and Compressed Sensing

�2� 1

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

Upper/lower RIC:

�ik = max

|I|=k�i(�I)

�k = min(�1k, �2

k)

k

�2k

�2k

Monte-Carlo estimation:�k � �k

smallest / largest eigenvalues of A�A

N = 4000, P = 1000

Numerics with RIP

Page 110: Sparsity and Compressed Sensing

�(B�)

x0 �x0

�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x� � argmin�x=y

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

Page 111: Sparsity and Compressed Sensing

�(B�)

x0 �x0

�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

x� � argmin�x=y

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

Page 112: Sparsity and Compressed Sensing

C(0,1,1)

K(0,1,1)

Ks =�(�isi)i � R3 \ �i � 0

� 2-D conesCs = �Ks

2-D quadrant

L1 Recovery in 2-D

��1

�2�3

� = (�i)i � R2�3

y �� x�

Page 113: Sparsity and Compressed Sensing

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

Page 114: Sparsity and Compressed Sensing

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

� Computation of“pathological” signals

[Dossal, P, Fadili, 2010]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

Page 115: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 116: Sparsity and Compressed Sensing

Tomography and Fourier Measures

Page 117: Sparsity and Compressed Sensing

Tomography and Fourier Measures

Fourier slice theorem: p�(⇥) = f(⇥ cos(�), ⇥ sin(�))

1D 2D Fourier

�k

f = FFT2(f)

Partial Fourier measurements:

Equivalent to:

{p�k(t)}t�R0�k<K

�f = {f [�]}���

Page 118: Sparsity and Compressed Sensing

Disclaimer: this is not compressed sensing.

Regularized Inversion

f⇥ = argminf

12

���

|y[⇤] � f [⇤]|2 + ��

m

|⇥f, ⇥m⇤|.�1 regularization:

Noisy measurements: ⇥� � �, y[�] = f0[�] + w[�].

Noise: w[⇥] � N (0,�), white noise.

f+ f�

Page 119: Sparsity and Compressed Sensing

MRI ImagingFrom [Lutsig et al.]

Page 120: Sparsity and Compressed Sensing

Fourier sub-sampling pattern:randomization

MRI Reconstruction

High resolution Linear SparsityLow resolution

From [Lutsig et al.]

Page 121: Sparsity and Compressed Sensing

Pseudo inverse Sparse wavelets

�� Sampling low frequencies helps.

Compressive Fourier Measurements

Page 122: Sparsity and Compressed Sensing

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� � �, y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Structured Measurements

� = (��)���

Page 123: Sparsity and Compressed Sensing

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� � �, y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Structured Measurements

� = (��)���

Page 124: Sparsity and Compressed Sensing

�� not universal: requires incoherence.

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� � �, y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Structured Measurements

Theorem: with high probability on �,

If M � CP

µ2 log(N)4, then �2M �

�2� 1

[Rudelson, Vershynin, 2006]

� = (��)���

Page 125: Sparsity and Compressed Sensing

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Page 126: Sparsity and Compressed Sensing

Setting:H: Hilbert space. Here: H = RN .

G : H� R ⇤ {+⇥}

Convex Optimization

minx�H

G(x)Problem:

Page 127: Sparsity and Compressed Sensing

Setting:H: Hilbert space. Here: H = RN .

Class of functions:

G : H� R ⇤ {+⇥}

x y

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Convex:

minx�H

G(x)Problem:

Page 128: Sparsity and Compressed Sensing

Setting:H: Hilbert space. Here: H = RN .

Class of functions:

G : H� R ⇤ {+⇥}

lim infx�x0

G(x) � G(x0)

{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅

x y

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Lower semi-continuous:

Convex:

Proper:

minx�H

G(x)Problem:

Page 129: Sparsity and Compressed Sensing

Setting:H: Hilbert space. Here: H = RN .

Class of functions:

G : H� R ⇤ {+⇥}

lim infx�x0

G(x) � G(x0)

{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅

x y

�C(x) =�

0 if x ⇥ C,+� otherwise.

(C closed and convex)

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Indicator:

Lower semi-continuous:

Convex:

Proper:

minx�H

G(x)Problem:

Page 130: Sparsity and Compressed Sensing

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Proximal Operators

Page 131: Sparsity and Compressed Sensing

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

G(x) = ||x||1 =�

i

|xi|

G(x) = ||x||0 = | {i \ xi �= 0} |

G(x) =�

i

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

||x||0|x|log(1 + x2)

G(x)

Page 132: Sparsity and Compressed Sensing

�� 3rd order polynomial root.

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

G(x) = ||x||1 =�

i

|xi|

Prox�G(x)i = max�

0, 1� �

|xi|

�xi

G(x) = ||x||0 = | {i \ xi �= 0} |

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) =�

i

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

Page 133: Sparsity and Compressed Sensing

Separability: G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

Proximal Calculus

Page 134: Sparsity and Compressed Sensing

Separability:

Quadratic functionals:

= ��(Id + ����)�1

G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

G(x) =12

||�x� y||2

Prox�G = (Id + ����)�1��

Proximal Calculus

Page 135: Sparsity and Compressed Sensing

Separability:

Quadratic functionals:

= ��(Id + ����)�1

G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

G(x) =12

||�x� y||2

Prox�G = (Id + ����)�1��

Composition by tight frame:

Proximal Calculus

ProxG�A(x) = A� � ProxG �A + Id�A� �A

A � A� = Id

Page 136: Sparsity and Compressed Sensing

Separability:

Quadratic functionals:

Indicators:

= ��(Id + ����)�1

G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

G(x) =12

||�x� y||2

Prox�G = (Id + ����)�1��

G(x) = �C(x) x

Prox�G(x) = ProjC(x)= argmin

z�C||x� z||

Composition by tight frame:

Proximal Calculus

ProjC(x)C

ProxG�A(x) = A� � ProxG �A + Id�A� �A

A � A� = Id

Page 137: Sparsity and Compressed Sensing

If 0 < �� < 2/L, x(�) � x� a solution.Theorem:

Gradient descent:G is C1 and �G is L-Lipschitz

Gradient and Proximal Descents

[explicit]x(�+1) = x(�) � ���G(x(�))

Page 138: Sparsity and Compressed Sensing

If 0 < �� < 2/L, x(�) � x� a solution.Theorem:

Gradient descent:

x(�+1) = x(�) � ��v(�),

�� Problem: slow.

G is C1 and �G is L-Lipschitz

v(�) � �G(x(�))

Gradient and Proximal Descents

Sub-gradient descent:

[explicit]

If �� � 1/⇥, x(�) � x� a solution.Theorem:

x(�+1) = x(�) � ���G(x(�))

Page 139: Sparsity and Compressed Sensing

If 0 < �� < 2/L, x(�) � x� a solution.

If �� � c > 0, x(�) � x� a solution.

Theorem:

Gradient descent:

x(�+1) = x(�) � ��v(�),

�� Problem: slow.

G is C1 and �G is L-Lipschitz

v(�) � �G(x(�))

x(⇥+1) = Prox��G(x(⇥))

�� Prox�G hard to compute.

Gradient and Proximal Descents

Sub-gradient descent:

Proximal-point algorithm:

[explicit]

[implicit]

If �� � 1/⇥, x(�) � x� a solution.Theorem:

Theorem:

x(�+1) = x(�) � ���G(x(�))

Page 140: Sparsity and Compressed Sensing

Solve minx�H

E(x)

Problem: Prox�E is not available.

Proximal Splitting Methods

Page 141: Sparsity and Compressed Sensing

Solve minx�H

E(x)

Splitting: E(x) = F (x) +�

i

Gi(x)

SimpleSmooth

Problem: Prox�E is not available.

Proximal Splitting Methods

Page 142: Sparsity and Compressed Sensing

Solve minx�H

E(x)

Splitting: E(x) = F (x) +�

i

Gi(x)

SimpleSmooth

Problem: Prox�E is not available.

Iterative algorithms using: �F (x)Prox�Gi(x)

Forward-Backward:Douglas-Rachford:

Primal-Dual:Generalized FB:

�Gi�Gi � A

F +�

Gi

F + Gsolves

Proximal Splitting Methods

Page 143: Sparsity and Compressed Sensing

SimpleSmooth

Data fidelity:

Regularization:

f0 = �x0 sparse in dictionary �.

Inverse problem: y = Kf0 + wmeasurements

K : RN � RP , P � NK

� = K � ⇥F (x) =12

||y � �x||2

G(x) = ||x||1 =�

i

|xi|

minx�RN

F (x) + G(x)

Sparse recovery: f� = �x� where x� solves

Model:

Smooth + Simple Splitting

Kf0f0

Page 144: Sparsity and Compressed Sensing

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Page 145: Sparsity and Compressed Sensing

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Forward-backward:

Page 146: Sparsity and Compressed Sensing

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

G = �C

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Forward-backward:

Projected gradient descent:

Page 147: Sparsity and Compressed Sensing

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

G = �C

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Forward-backward:

Projected gradient descent:

Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

Page 148: Sparsity and Compressed Sensing

minx

12

||�x� y||2 + �||x||1 minx

F (x) + G(x)

F (x) =12

||�x� y||2

G(x) = �||x||1

�F (x) = ��(�x� y)

Prox�G(x)i = max�

0, 1� �⇥

|xi|

�xi

L = ||���||

Example: L1 Regularization

��

Forward-backward Iterative soft thresholding��

Page 149: Sparsity and Compressed Sensing

Douglas-Rachford iterations:

(�)

RProx�G(x) = 2Prox�G(x)� x

x(⇥+1) = Prox�G2(z(⇥+1))

z(⇥+1) =�1� �

2

�z(⇥) +

2RProx�G2 � RProx�G1(z

(⇥))

Reflexive prox:

Douglas Rachford Scheme

Simple Simple

minx

G1(x) + G2(x)

Page 150: Sparsity and Compressed Sensing

Douglas-Rachford iterations:

Theorem:

(�)

a solution of (�)

RProx�G(x) = 2Prox�G(x)� x

x(�) � x�

x(⇥+1) = Prox�G2(z(⇥+1))

z(⇥+1) =�1� �

2

�z(⇥) +

2RProx�G2 � RProx�G1(z

(⇥))

If 0 < � < 2 and ⇥ > 0,

Reflexive prox:

Douglas Rachford Scheme

Simple Simple

minx

G1(x) + G2(x)

Page 151: Sparsity and Compressed Sensing

minx

G1(x) + G2(x)

G1(x) = iC(x), C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

G2(x) = ||x||1 Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

i

�� e⇥cient if ��� easy to invert.

Example: Constrainted L1

min�x=y

||x||1 ��

Page 152: Sparsity and Compressed Sensing

50 100 150 200 250

−5

−4

−3

−2

−1

0

1

minx

G1(x) + G2(x)

G1(x) = iC(x), C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

G2(x) = ||x||1 Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

i

�� e⇥cient if ��� easy to invert.

� = 0.01� = 1� = 10

Example: compressed sensing

� � R100�400 Gaussian matrix

||x0||0 = 17y = �x0

log10(||x(�)||1 � ||x�||1)

Example: Constrainted L1

min�x=y

||x||1 ��

Page 153: Sparsity and Compressed Sensing

C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk

each Fi is simpleminx

G1(x) + . . . + Gk(x)

minx

G(x1, . . . , xk) + �C(x1, . . . , xk)

G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)

More than 2 Functionals

��

Page 154: Sparsity and Compressed Sensing

C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk

Prox�⇥C (x1, . . . , xk) = (x, . . . , x) where x =1k

i

xi

each Fi is simpleminx

G1(x) + . . . + Gk(x)

minx

G(x1, . . . , xk) + �C(x1, . . . , xk)

G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)

G and �C are simple:

Prox�G(x1, . . . , xk) = (Prox�Gi(xi))i

More than 2 Functionals

��

Page 155: Sparsity and Compressed Sensing

Linear map A : E � H.

C = {(x, y) ⇥ H� E \ Ax = y}

minx

G1(x) + G2 � A(x)

G1, G2 simple.minz⇥H�E

G(z) + �C(z)

G(x, y) = G1(x) + G2(y)

Auxiliary Variables

��

Page 156: Sparsity and Compressed Sensing

Linear map A : E � H.

C = {(x, y) ⇥ H� E \ Ax = y}

Prox�C (x, y) = (x + A�y, y � y) = (x, Ax)

wherey = (Id + AA�)�1(Ax� y)

x = (Id + A�A)�1(A�y + x)

�� e�cient if Id + AA� or Id + A�A easy to invert.

minx

G1(x) + G2 � A(x)

G1, G2 simple.minz⇥H�E

G(z) + �C(z)

G(x, y) = G1(x) + G2(y)

Prox�G(x, y) = (Prox�G1(x),Prox�G2(y))

Auxiliary Variables

��

Page 157: Sparsity and Compressed Sensing

G1(u) = ||u||1 Prox�G1(u)i = max�

0, 1� �

||ui||

�ui

minf

12||Kf � y||2 + �||⇥f ||1

minx

G1(f) + G2 � �(f)

G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�

C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f

Prox�C (f, u) = (f ,�f)

Example: TV Regularization

||u||1 =�

i

||ui||

��

Page 158: Sparsity and Compressed Sensing

Compute the solution of:

�� O(N log(N)) operations using FFT.

G1(u) = ||u||1 Prox�G1(u)i = max�

0, 1� �

||ui||

�ui

minf

12||Kf � y||2 + �||⇥f ||1

minx

G1(f) + G2 � �(f)

G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�

C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f

Prox�C (f, u) = (f ,�f)

(Id + �)f = �div(u) + f

Example: TV Regularization

||u||1 =�

i

||ui||

��

Page 159: Sparsity and Compressed Sensing

Iteration �y = Kx0

y = �f0 + wOrignal f0 Recovery f�

Example: TV Regularization

Page 160: Sparsity and Compressed Sensing

dictionary

ConclusionSparsity: approximate signals with few atoms.

Page 161: Sparsity and Compressed Sensing

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

Page 162: Sparsity and Compressed Sensing

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

The devil is in the constants:

�� Worse case analysis is problematic.

�� Designing good signal models.

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

Page 163: Sparsity and Compressed Sensing

Dictionary learning:

learning

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Page 164: Sparsity and Compressed Sensing

Dictionary learning:

Analysis vs. synthesis:

learning

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Image f = �x

Coe�cients x

Page 165: Sparsity and Compressed Sensing

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Image f = �x

Coe�cients x c = D�f

� D�

Page 166: Sparsity and Compressed Sensing

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

Page 167: Sparsity and Compressed Sensing

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

3)12

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

Page 168: Sparsity and Compressed Sensing

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

3)12

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Nuclear