signal processing course : compressed sensing

Post on 21-Jun-2015

2.178 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides for a course on signal and image processing.

TRANSCRIPT

Overview

•Shannon’s World

•Compressive Sensing Acquisition

•Compressive Sensing Recovery

•Theoretical Guarantees

•Fourier Domain Measurements

Sampling:

f̃ � L2([0, 1]d) f � RN

Idealization:

acquisitiondevice

f [n] ⇡ f̃(n/N)

Discretization

Data aquisition:

Sensors

Pointwise Sampling and Smoothness

f̃ � L2 f � RN

f [i] = f̃(i/N)

Data aquisition:

Sensors

f̃(t) =�

i

f [i]h(Nt� i)

Shannon interpolation: if Supp( ˆ̃f) � [�N�, N�]

h(t) =sin(�t)

�t

Pointwise Sampling and Smoothness

f̃ � L2 f � RN

f [i] = f̃(i/N)

Data aquisition:

Sensors

f̃(t) =�

i

f [i]h(Nt� i)

�� Natural images are not smooth.

Shannon interpolation: if Supp( ˆ̃f) � [�N�, N�]

h(t) =sin(�t)

�t

Pointwise Sampling and Smoothness

f̃ � L2 f � RN

f [i] = f̃(i/N)

Data aquisition:

Sensors

f̃(t) =�

i

f [i]h(Nt� i)

�� Natural images are not smooth.

�� But can be compressed e�ciently.

Shannon interpolation: if Supp( ˆ̃f) � [�N�, N�]

0,1,0,. . .

h(t) =sin(�t)

�t

�� Sample and compress simultaneously?

Pointwise Sampling and Smoothness

f̃ � L2 f � RN

f [i] = f̃(i/N)

JPEG-2k

Sampling and Periodization

(a)

(c)

(d)

(b)

1

0

Sampling and Periodization: Aliasing

(b)

(c)

(d)

(a)

0

1

Overview

•Shannon’s World

•Compressive Sensing Acquisition

•Compressive Sensing Recovery

•Theoretical Guarantees

•Fourier Domain Measurements

Single Pixel Camera (Rice)

P measures � N micro-mirrors

Single Pixel Camera (Rice)

y[i] = �f, �i�

P/N = 0.16 P/N = 0.02P/N = 1

P measures � N micro-mirrors

Single Pixel Camera (Rice)

y[i] = �f, �i�

Physical hardware resolution limit: target resolution f � RN .

f̃ � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardwareK

CS Hardware Model

CS is about designing hardware: input signals f̃ � L2(R2).

Physical hardware resolution limit: target resolution f � RN .

f̃ � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

K

CS Hardware Model

CS is about designing hardware: input signals f̃ � L2(R2).

,

,

Physical hardware resolution limit: target resolution f � RN .

f̃ � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

fOperator K

K

CS Hardware Model

CS is about designing hardware: input signals f̃ � L2(R2).

,

,

Overview

•Shannon’s World

•Compressive Sensing Acquisition

•Compressive Sensing Recovery

•Theoretical Guarantees

•Fourier Domain Measurements

Need to solve y = Kf .

� More unknown than equations.

dim(ker(K)) = N � P is huge.

Inversion and Sparsity

f

Operator K

Need to solve y = Kf .

� More unknown than equations.

dim(ker(K)) = N � P is huge.

Prior information: f is sparse in a basis {�m}m.

J�(f) = Card {m \ |�f, �m�| > �} is small.

Inversion and Sparsity

f

Operator K

�f, �m�f

Image with 2 pixels:

q = 0

Convex Relaxation: L1 Prior

J0(f) = # {m \ ⇥f, �m⇤ �= 0}J0(f) = 0 �� null image.J0(f) = 1 �� sparse image.J0(f) = 2 �� non-sparse image.

Image with 2 pixels:

Jq(f) =�

m

|�f, �m⇥|qq = 0 q = 1 q = 2q = 3/2q = 1/2

Convex Relaxation: L1 Prior

J0(f) = # {m \ ⇥f, �m⇤ �= 0}

�q priors: (convex for q � 1)

J0(f) = 0 �� null image.J0(f) = 1 �� sparse image.J0(f) = 2 �� non-sparse image.

Image with 2 pixels:

Jq(f) =�

m

|�f, �m⇥|qq = 0 q = 1 q = 2q = 3/2q = 1/2

Convex Relaxation: L1 Prior

J1(f) =�

m

|�f, �m⇥|Sparse �1 prior:

J0(f) = # {m \ ⇥f, �m⇤ �= 0}

�q priors: (convex for q � 1)

J0(f) = 0 �� null image.J0(f) = 1 �� sparse image.J0(f) = 2 �� non-sparse image.

f0 � RN sparse in ortho-basis �

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse recovery:min

||�x�y||�||w||||x||1 min

x

12

||�x� y||2 + �||x||1||w||�� �

Sparse CS Recovery

���

x0 � RN

f0 � RN

� = translation invariantwavelet frame

Original f0

CS Simulation Example

Overview

•Shannon’s World

•Compressive Sensing Acquisition

•Compressive Sensing Recovery

•Theoretical Guarantees

•Fourier Domain Measurements

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

[Candes 2009]

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Theorem: If �2k ��

2� 1, then

where xk is the best k-term approximation of x0.

||x0 � x�|| � C0⇥k

||x0 � xk||1 + C1�

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

Singular Values Distributions

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

P = 200, k = 10

f�(�)

�k = 30

Link with coherence:

�k � (k � 1)µ(�)

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

Stronger result:

�2 = µ(�)

RIP for Gaussian Matrices

k � C

log(N/P )PTheorem: If

then �2k ��

2� 1 with high probability.

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

smallest / largest eigenvalues of A�A

Numerics with RIP

�2� 1

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

Upper/lower RIC:

�ik = max

|I|=k�i(�I)

�k = min(�1k, �2

k)

k

�̂2k

�̂2k

Monte-Carlo estimation:�̂k � �k

smallest / largest eigenvalues of A�A

N = 4000, P = 1000

Numerics with RIP

�(B�)

x0 �x0

�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x� � argmin�x=y

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

�(B�)

x0 �x0

�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

x� � argmin�x=y

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

C(0,1,1)

K(0,1,1)

Ks =�(�isi)i � R3 \ �i � 0

� 2-D conesCs = �Ks

2-D quadrant

L1 Recovery in 2-D

��1

�2�3

� = (�i)i � R2�3

y �� x�

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

� Computation of“pathological” signals

[Dossal, P, Fadili, 2010]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

Overview

•Shannon’s World

•Compressive Sensing Acquisition

•Compressive Sensing Recovery

•Theoretical Guarantees

•Fourier Domain Measurements

Tomography and Fourier Measures

Kf = (f̂ [!])!2⌦

Tomography and Fourier Measures

Fourier slice theorem: p̂�(⇥) = f̂(⇥ cos(�), ⇥ sin(�))

1D 2D Fourier

�k

f̂ = FFT2(f)

Partial Fourier measurements:

Equivalent to:

{p�k(t)}t�R0�k<K

Regularized Inversion

f⇥ = argminf

12

���

|y[⇤] � f̂ [⇤]|2 + ��

m

|⇥f, ⇥m⇤|.�1 regularization:

Noisy measurements: ⇥� � �, y[�] = f̂0[�] + w[�].

Noise: w[⇥] � N (0,�), white noise.

MRI ImagingFrom [Lutsig et al.]

Fourier sub-sampling pattern:

randomization

MRI Reconstruction

High resolution Linear SparsityLow resolution

From [Lutsig et al.]

Fourier sampling(Earth’s rotation)

Linearreconstruction

Radar InterferometryCARMA (USA)

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

Fast measurements: (e.g. Fourier basis)

Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.

Structured Measurements

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.

Structured Measurements

�� not universal: requires incoherence.

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.

Structured Measurements

Theorem: with high probability on �,

If M � CP

µ2 log(N)4, then �2M �

�2� 1

[Rudelson, Vershynin, 2006]

� = K

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

The devil is in the constants:

�� Worse case analysis is problematic.

�� Designing good signal models.

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

Dictionary learning:

learning

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20

ITE

RA

TIO

NS

HA

VE

BE

EN

PE

RFO

RM

ED.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT,

IND

ICA

TIN

GO

NC

EA

GA

INT

HE

CO

NSIST

EN

TIM

PRO

VE

ME

NT

OB

TAIN

ED

WIT

HO

UR

PR

OPO

SED

TE

CH

NIQ

UE

patch),inorder

topreventany

learningof

theseartifacts

(over-fitting).

We

definethen

thepatch

sparsityof

thedecom

po-sition

asthis

number

ofsteps.T

hestopping

criteriain

(2)be-

comes

thenum

berof

atoms

usedinstead

ofthe

reconstructionerror.U

singa

small

duringthe

OM

Pperm

itsto

learna

dic-tionary

specializedin

providinga

coarseapproxim

ation.O

urassum

ptionis

that(pattern)

artifactsare

lesspresent

incoarse

approximations,preventing

thedictionary

fromlearning

them.

We

proposethen

thealgorithm

describedin

Fig.6.We

typicallyused

toprevent

thelearning

ofartifacts

andfound

outthattw

oouteriterations

inthe

scheme

inFig.6

aresufficientto

givesatisfactory

results,while

within

theK

-SVD

,10–20itera-

tionsare

required.To

conclude,inorderto

addressthedem

osaicingproblem

,we

usethe

modified

K-SV

Dalgorithm

thatdealsw

ithnonuniform

noise,asdescribed

inprevious

section,andadd

toitan

adaptivedictionary

thathasbeen

learnedw

ithlow

patchsparsity

inorder

toavoid

over-fittingthe

mosaic

pattern.The

same

techniquecan

beapplied

togeneric

colorinpainting

asdem

onstratedin

thenextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Dictionary learning:

Analysis vs. synthesis:

learning

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20

ITE

RA

TIO

NS

HA

VE

BE

EN

PE

RFO

RM

ED.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT,

IND

ICA

TIN

GO

NC

EA

GA

INT

HE

CO

NSIST

EN

TIM

PRO

VE

ME

NT

OB

TAIN

ED

WIT

HO

UR

PR

OPO

SED

TE

CH

NIQ

UE

patch),inorder

topreventany

learningof

theseartifacts

(over-fitting).

We

definethen

thepatch

sparsityof

thedecom

po-sition

asthis

number

ofsteps.T

hestopping

criteriain

(2)be-

comes

thenum

berof

atoms

usedinstead

ofthe

reconstructionerror.U

singa

small

duringthe

OM

Pperm

itsto

learna

dic-tionary

specializedin

providinga

coarseapproxim

ation.O

urassum

ptionis

that(pattern)

artifactsare

lesspresent

incoarse

approximations,preventing

thedictionary

fromlearning

them.

We

proposethen

thealgorithm

describedin

Fig.6.We

typicallyused

toprevent

thelearning

ofartifacts

andfound

outthattw

oouteriterations

inthe

scheme

inFig.6

aresufficientto

givesatisfactory

results,while

within

theK

-SVD

,10–20itera-

tionsare

required.To

conclude,inorderto

addressthedem

osaicingproblem

,we

usethe

modified

K-SV

Dalgorithm

thatdealsw

ithnonuniform

noise,asdescribed

inprevious

section,andadd

toitan

adaptivedictionary

thathasbeen

learnedw

ithlow

patchsparsity

inorder

toavoid

over-fittingthe

mosaic

pattern.The

same

techniquecan

beapplied

togeneric

colorinpainting

asdem

onstratedin

thenextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Image f = �x

Coe�cients x

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20

ITE

RA

TIO

NS

HA

VE

BE

EN

PE

RFO

RM

ED.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT,

IND

ICA

TIN

GO

NC

EA

GA

INT

HE

CO

NSIST

EN

TIM

PRO

VE

ME

NT

OB

TAIN

ED

WIT

HO

UR

PR

OPO

SED

TE

CH

NIQ

UE

patch),inorder

topreventany

learningof

theseartifacts

(over-fitting).

We

definethen

thepatch

sparsityof

thedecom

po-sition

asthis

number

ofsteps.T

hestopping

criteriain

(2)be-

comes

thenum

berof

atoms

usedinstead

ofthe

reconstructionerror.U

singa

small

duringthe

OM

Pperm

itsto

learna

dic-tionary

specializedin

providinga

coarseapproxim

ation.O

urassum

ptionis

that(pattern)

artifactsare

lesspresent

incoarse

approximations,preventing

thedictionary

fromlearning

them.

We

proposethen

thealgorithm

describedin

Fig.6.We

typicallyused

toprevent

thelearning

ofartifacts

andfound

outthattw

oouteriterations

inthe

scheme

inFig.6

aresufficientto

givesatisfactory

results,while

within

theK

-SVD

,10–20itera-

tionsare

required.To

conclude,inorderto

addressthedem

osaicingproblem

,we

usethe

modified

K-SV

Dalgorithm

thatdealsw

ithnonuniform

noise,asdescribed

inprevious

section,andadd

toitan

adaptivedictionary

thathasbeen

learnedw

ithlow

patchsparsity

inorder

toavoid

over-fittingthe

mosaic

pattern.The

same

techniquecan

beapplied

togeneric

colorinpainting

asdem

onstratedin

thenextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Image f = �x

Coe�cients x c = D�f

� D�

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20

ITE

RA

TIO

NS

HA

VE

BE

EN

PE

RFO

RM

ED.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT,

IND

ICA

TIN

GO

NC

EA

GA

INT

HE

CO

NSIST

EN

TIM

PRO

VE

ME

NT

OB

TAIN

ED

WIT

HO

UR

PR

OPO

SED

TE

CH

NIQ

UE

patch),inorder

topreventany

learningof

theseartifacts

(over-fitting).

We

definethen

thepatch

sparsityof

thedecom

po-sition

asthis

number

ofsteps.T

hestopping

criteriain

(2)be-

comes

thenum

berof

atoms

usedinstead

ofthe

reconstructionerror.U

singa

small

duringthe

OM

Pperm

itsto

learna

dic-tionary

specializedin

providinga

coarseapproxim

ation.O

urassum

ptionis

that(pattern)

artifactsare

lesspresent

incoarse

approximations,preventing

thedictionary

fromlearning

them.

We

proposethen

thealgorithm

describedin

Fig.6.We

typicallyused

toprevent

thelearning

ofartifacts

andfound

outthattw

oouteriterations

inthe

scheme

inFig.6

aresufficientto

givesatisfactory

results,while

within

theK

-SVD

,10–20itera-

tionsare

required.To

conclude,inorderto

addressthedem

osaicingproblem

,we

usethe

modified

K-SV

Dalgorithm

thatdealsw

ithnonuniform

noise,asdescribed

inprevious

section,andadd

toitan

adaptivedictionary

thathasbeen

learnedw

ithlow

patchsparsity

inorder

toavoid

over-fittingthe

mosaic

pattern.The

same

techniquecan

beapplied

togeneric

colorinpainting

asdem

onstratedin

thenextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

3)12

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20

ITE

RA

TIO

NS

HA

VE

BE

EN

PE

RFO

RM

ED.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT,

IND

ICA

TIN

GO

NC

EA

GA

INT

HE

CO

NSIST

EN

TIM

PRO

VE

ME

NT

OB

TAIN

ED

WIT

HO

UR

PR

OPO

SED

TE

CH

NIQ

UE

patch),inorder

topreventany

learningof

theseartifacts

(over-fitting).

We

definethen

thepatch

sparsityof

thedecom

po-sition

asthis

number

ofsteps.T

hestopping

criteriain

(2)be-

comes

thenum

berof

atoms

usedinstead

ofthe

reconstructionerror.U

singa

small

duringthe

OM

Pperm

itsto

learna

dic-tionary

specializedin

providinga

coarseapproxim

ation.O

urassum

ptionis

that(pattern)

artifactsare

lesspresent

incoarse

approximations,preventing

thedictionary

fromlearning

them.

We

proposethen

thealgorithm

describedin

Fig.6.We

typicallyused

toprevent

thelearning

ofartifacts

andfound

outthattw

oouteriterations

inthe

scheme

inFig.6

aresufficientto

givesatisfactory

results,while

within

theK

-SVD

,10–20itera-

tionsare

required.To

conclude,inorderto

addressthedem

osaicingproblem

,we

usethe

modified

K-SV

Dalgorithm

thatdealsw

ithnonuniform

noise,asdescribed

inprevious

section,andadd

toitan

adaptivedictionary

thathasbeen

learnedw

ithlow

patchsparsity

inorder

toavoid

over-fittingthe

mosaic

pattern.The

same

techniquecan

beapplied

togeneric

colorinpainting

asdem

onstratedin

thenextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

3)12

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20

ITE

RA

TIO

NS

HA

VE

BE

EN

PE

RFO

RM

ED.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT,

IND

ICA

TIN

GO

NC

EA

GA

INT

HE

CO

NSIST

EN

TIM

PRO

VE

ME

NT

OB

TAIN

ED

WIT

HO

UR

PR

OPO

SED

TE

CH

NIQ

UE

patch),inorder

topreventany

learningof

theseartifacts

(over-fitting).

We

definethen

thepatch

sparsityof

thedecom

po-sition

asthis

number

ofsteps.T

hestopping

criteriain

(2)be-

comes

thenum

berof

atoms

usedinstead

ofthe

reconstructionerror.U

singa

small

duringthe

OM

Pperm

itsto

learna

dic-tionary

specializedin

providinga

coarseapproxim

ation.O

urassum

ptionis

that(pattern)

artifactsare

lesspresent

incoarse

approximations,preventing

thedictionary

fromlearning

them.

We

proposethen

thealgorithm

describedin

Fig.6.We

typicallyused

toprevent

thelearning

ofartifacts

andfound

outthattw

oouteriterations

inthe

scheme

inFig.6

aresufficientto

givesatisfactory

results,while

within

theK

-SVD

,10–20itera-

tionsare

required.To

conclude,inorderto

addressthedem

osaicingproblem

,we

usethe

modified

K-SV

Dalgorithm

thatdealsw

ithnonuniform

noise,asdescribed

inprevious

section,andadd

toitan

adaptivedictionary

thathasbeen

learnedw

ithlow

patchsparsity

inorder

toavoid

over-fittingthe

mosaic

pattern.The

same

techniquecan

beapplied

togeneric

colorinpainting

asdem

onstratedin

thenextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Nuclear

top related