an average of several sparse representations is better than the sparsest one alone michael elad the...

An Average of Several Sparse Representations is Better than

the Sparsest One Alone

Michael Elad The Computer Science Department The Technion – Israel Institute of technology Haifa 32000, Israel

*

Joint work with Irad Yavneh

The Computer Science Department The Technion – Israel Institute of

technologyHaifa 32000, Israel

Structured Decomposition and Efficient AlgorithmsDecember 1-5th, 2008

An Average of Several Sparse Representations is Better than the Sparsest One AloneBy: Michael Elad

2

Noise Removal?

In this talk we focus on signal/image denoising …

Important: (i) Practical application; (ii) A convenient platform for testing basic ideas in signal/image processing.

Many Considered Directions: Partial differential equations, Statistical estimators, Adaptive filters, Inverse problems & regularization, Wavelets, Example-based techniques, Sparse representations, …

Main Massage Today: Several (different!!!) sparse representations can be found and used for better denoising performance – we introduce, demonstrate and explain this new idea.

Remove Additive

Noise ?


3

Part I Background on Denoising with Sparse

Representations


4

Relation to measurements

Denoising By Energy Minimization

Thomas Bayes 1702 -

1761

Prior or regularizationy : Given measurements

x : Unknown to be recovered

xPryx21

xf22

Many of the proposed signal denoising algorithms are related to the minimization of an energy function of the form

This is in-fact a Bayesian point of view, adopting the Maximum-A-posteriori Probability (MAP) estimation.

Clearly, the wisdom in such an approach is within the choice of the prior – modeling the signals of interest.


5

The Evolution Of Pr(x)

During the past several decades we have made all sort of guesses about the prior Pr(x) for signals/images:

• Hidden Markov Models,

• Compression algorithms as priors,

• …

22

xxPr

Energy

22

xxPr L

Smoothness

2xxPrW

L

Adapt+ Smooth

xxPr L

Robust Statistics

1

xxPr

Total-Variation

1

xxPr W

Wavelet Sparsity

0

0xPr

Sparse & Redundant

Dxfor


6

Sparse Modeling of Signals

MK

N

DA fixed Dictionary

Every column in D (dictionary) is a prototype signal (atom).

The vector is generated randomly with few (say L) non-zeros at random locations and with random values.

A sparse & random vector

αx

N


7

ˆx̂

L.t.sy21

minargˆ00

22

D

D

D-y = -

Back to Our MAP Energy Function We L0 norm is effectively

counting the number of non-zeros in .

The vector is the representation (sparse/redundant).

Bottom line: Denoising of y is done by minimizing

or .

x

L.t.symin 00

2

2

D 22

200 y.t.smin

D


8

Next steps: given the previously found atoms, find the next one to best fit the residual. The algorithm stops when the error is below the destination threshold.

The MP is one of the greedy algorithms that finds one atom at a time [Mallat & Zhang

(’93)]. Step 1: find the one atom that

best matches the signal.

The Orthogonal MP (OMP) is an improved version that re-evaluates the coefficients by Least-Squares after each round.

2yD

The Solver We Use: Greed Based

22

200 y.t.smin

D


9

Orthogonal Matching Pursuit

Ki1forrdzmin)i(ECompute 1ni

z

0

00

0

Sand

yyr

0,0n

D

2

nr

1nn

Initialization

Main Iteration

1.

2.

3.

4.

5.

)i(E)i(E,Ki1.t.siChoose 00

nn Spsup.t.symin:LS

Dnn yr:sidualReUpdate D

}i{SS:SUpdate 01nnn

StopYesNo

OMP finds one atom at a time for approximating the solution of

22

200 y.t.smin

D


10

Part II Finding & Using More

than One Sparse Representation


11

Back to the Beginning. What If …

Consider the denoising problem

and suppose that we can find a group of J candidate

solutions

such that

22

200 y.t.smin

D

J1jj

22

2j

0

0j

y

Nj

D

Basic Questions: What could we do with

such a set of competing solutions?

Why should this work?

How shall we practically find such a set of solutions?

Relenat work: [Larsson & Selen (’07)] [Schintter et. al. (`08)] and

[Elad and Yavneh (’08)]


12

Motivation

Why bother with such a set?

Because of the intriguing relation to example-based techniques, where several nearest-neighbors for the signal are used jointly.

Because each representation conveys a different story about the desired signal.

Because pursuit algorithms are often wrong in finding the sparsest representation, and then relying on their solution becomes too sensitive.

… Maybe there are “deeper” reasons?

1

2

D

D

2

2


13

Generating Many Representations

Our Answer: Randomizing the OMP

Ki1forrdzmin)i(ECompute 1ni

z

0

00

0

Sand

yyr

0,0n

D

2

nr

1nn

Initialization

Main Iteration

1.

2.

3.

4.

5.

)i(E)i(E,Ki1.t.siChoose 00

nn Spsup.t.symin:LS

Dnn yr:sidualReUpdate D

}i{SS:SUpdate 01nnn

Stop

)i(EcexpyprobabilitwithiChoose0

YesNo


14

Lets Try

10000

y

Proposed Experiment :

Form a random D.

Multiply by a sparse vector α0 ( ).

Add Gaussian iid noise (σ=1) and obtain .

Solve the problem

using OMP, and obtain .

Use RandOMP and obtain .

Lets look at the obtained representations …

100

200

D

1000

1jRandOMPj

0

100y.t.smin2

200

D

+=v y

OMP


15

Some Observations

0 10 20 30 400

50

100

150

Candinality

His

togra

m

Random-OMP cardinalitiesOMP cardinality

85 90 95 100 1050

50

100

150

200

250

300

350

Representation ErrorH

isto

gra

m

Random-OMP errorOMP error

0 0.1 0.2 0.3 0.40

50

100

150

200

250

300

Noise Attenuation

His

togr

am

Random-OMP denoisingOMP denoising

0 5 10 15 200.05

0.1

0.15

0.2

0.25

0.3

0.35

Cardinality

No

ise

Att

enu

ation

Random-OMP denoisingOMP denoising

0

0

RandOMPi

2

20

220

y

D

DD

2

2yD

We see that

•The OMP gives the sparsest solution

•Nevertheless, it is not the most effective for denoising.

•The cardinality of a representation does not reveal its efficiency.


16

The Surprise …

0 50 100 150 200-3

-2

-1

0

1

2

3

index

valu

e

Averaged Rep.Original Rep.OMP Rep.

1000

1j

RandOMPj1000

1ˆ

Lets propose the average

as our representation

This representation IS NOT SPARSE AT

ALL but it gives

06.0y

ˆ2

20

220

D

DD


17

Is It Consistent? … Yes!

0 0.1 0.2 0.3 0.4 0.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

OMP Denoising Factor

Ra

ndO

MP

Den

oisi

ng F

acto

r

OMP versus RandOMP resultsMean Point

Here are the results of 1000 trials with

the same parameters …

?


18

The Explanation – Our Signal Model

K

N

DA fixed Dictionary

α

xSignal Model Assumed D is fixed and known

α is built by:

Choosing the support S w.p. P(S) of all the 2K possibilities Ω,

Choosing the coefficients using iid Gaussian entries N(0,σx): P(α|S).

The ideal signal is x=Dα.

The p.d.f. of the signal P(x) is:

S

SPSxPxP


19

The Explanation – Adding Noise

K

N

DA fixed Dictionary

α

x

yv

+Noise Assumed:

The noise v is additive white Gaussian vector with probability Pv(v)

The p.d.f. of the noisy signal P(y), and the conditionals P(y|S), P(S|y), and P(x|y) are clear and well-defined (although nasty).

2

2

2

yxexpCxyP


20

)S(P2

yexpySP

2

2

S22

x

2x

Projection of the signal y onto the support S

Spsup.t.sDyArgMinyS

D

Some Algebra Leads To

SS22

x

2xMMSE yySP...yxEx̂

Implications: The best estimator (in terms of L2 error) is a weighted average of many sparse

representations!!!


21

As It Turns Out …

The MMSE estimation we got requires a sweep through all 2K supports (i.e. combinatorial search) – impractical.

Similarly, an explicit expression for P(x/y) can be derived and maximized – this is the MAP estimation, and it also requires a sweep through all possible supports – impractical too.

The OMP is a (good) approximation for the MAP estimate.

The RandOMP is a (good) approximation of the Minimum-Mean-Squared-Error (MMSE) estimate. It is close to the Gibbs sampler of the probability P(S|y) from which we should draw the weights. Back to the beginning: Why Use Several

Representations? Because their average leads to provable better noise suppression.


22

0.5 1 1.5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Re

lativ

e M

ea

n-S

qu

are

d-E

rro

r

Comparative Results

The following results correspond to a small dictionary (20×30), where the combinatorial formulas can be evaluated as well.Parameters:

• N=20, K=30

• True support=3

• σx=1

• J=10 (RandOMP)

• Averaged over 1000 experiments

20

1. Emp. Oracle2. Theor. Oracle3. Emp. MMSE4. Theor. MMSE5. Emp. MAP6. Theor. MAP7. OMP8. RandOMP

Known support


23

Part III Summary and

Conclusion


24

Today We Have Seen that …

By finding the sparsest

representation and using it to recover the clean signal

How ?

Sparsity and Redundancy are

used for denoising of signals/images

Can we do better? Today we have shown that

averaging several sparse representations for a signal

lead to better denoising, as it approximates the MMSE

estimator.

More on these (including the slides and the relevant papers) can be found in http://www.cs.technion.ac.il/~elad

an average of several sparse representations is better than the sparsest one alone michael elad the...

Documents

denoising of y

sparse vector

denoising problemand

sparse modeling of signals

better denoising performance

sparsest representation

pursuit algorithms

desired signal