introduction to sparsity in signal...

Post on 14-Aug-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Sparsity in Signal Processing

Ivan Selesnick

Polytechnic Institute of New York University

Brooklyn, New York

selesi@poly.edu

2012

1

Under-determined linear equations

Consider a system of under-determined system of equations

y = Ax (1)

A : M × N M < N

y : length-M

x : length-N

y =

y(0)...

y(M − 1)

x =

x(0)

...

x(N − 1)

The system has more unknowns than equations.

The matrix A is wider than it is tall.

We assume that AA∗ is invertible, therefore the system of equations (1) hasinfinitely many solutions.

2

Norms

We will use the `2 and `1 norms:

‖x‖22 :=N−1∑n=0

|x(n)|2 (2)

‖x‖1 :=N−1∑n=0

|x(n)|. (3)

‖x‖22, i. e. the sum of squares, is referred to as the ‘energy’ of x.

3

Least squares

To solve y = Ax, it is common to minimize the energy of x.

argminx‖x‖22 (4a)

such that y = Ax. (4b)

Solution to (4):x = A∗(AA∗)−1 y. (5)

When y is noisy, don’t solve y = Ax exactly. Instead, find approximate solution:

argminx‖y − Ax‖22 + λ‖x‖22. (6)

Solution to (6):x = (A∗A + λI)−1A∗ y. (7)

Large scale systems −→ fast algorithms needed..

4

Sparse solutions

Another approach to solve y = Ax:

argminx‖x‖1 (8a)

such that y = Ax (8b)

Problem (8) is the basis pursuit (BP) problem.

When y is noisy, don’t solve y = Ax exactly. Instead, find approximate solution:

argminx‖y − Ax‖22 + λ‖x‖1. (9)

Problem (9) is the basis pursuit denoising (BPD) problem.

The BP/BPD problems can not be solved in explicit form, only by iterativenumerical algorithms.

5

Least squares & BP/BPD

Least squares and BP/BPD solutions are quite different. Why?

To minimize ‖x‖22 . . . the largest values of x must be made small as theycount much more than the smallest values.⇒ least square solutions have many small values, as they are relativelyunimportant ⇒ least square solutions are not sparse.

x

x2

|x|

1 2�1�2 0

t

f(t)

�(|t| + ✏)p

a + b |t|

t0

x

f(x)

f(x) = x

f(x) = sin x

f(x) = 120ex

1

Therefore, when it is known/expected that x is sparse, use the `1 norm;not the `2 norm.

6

Algorithms for sparse solutions

Cost function:

I Non-differentiable

I Convex

I Large-scale

Algorithms:

I ‘Matrix-free’

I ISTA

I FISTA

I SALSA (ADMM)

and more...

7

Parseval frames

If A satisfiesAA∗ = pI (10)

then the columns of A are said to form a ‘Parseval frame’.A can be rectangular.

Least squares solution (5) becomes

x = A∗(AA∗)−1 y =1

pA∗ y (AA∗ = pI) (11)

No matrix inversion needed.

Least square solution (7) becomes

x = (A∗A + λI)−1A∗ y =1

λ+ pA∗ y (AA∗ = pI) (12)

(Use the matrix inverse lemma.) No matrix inversion needed.

If A satisfies (10), then it is very easy to find least square solutions.Some algorithms for BP/BPD also become computationally easier.

8

Example: Sparse Fourier coefficients using BP

The Fourier transform tells how to write the signal as a sum of sinusoids. But,it is not the only way.

Basis pursuit gives a sparse spectrum.

Suppose the M-point signal y(m) is written as

y(m) =N−1∑n=0

c(n) exp

(j2π

Nmn

), 0 ≤ m ≤ M − 1 (13)

where c(n) is a length-N coefficient sequence, with M ≤ N.

y = Ac (14)

Am,n = exp

(j2π

Nmn

), 0 ≤ m ≤ M − 1, 0 ≤ n ≤ N − 1 (15)

c : length-N

The coefficients c(n) are frequency-domain (Fourier) coefficients.

9

Example: Sparse Fourier coefficients using BP

1. If N = M, then A is the inverse N-point DFT matrix.

2. If N > M, then A is the first M rows of the inverse N-point DFT matrix.⇒ A or A∗ can be implemented efficiently using the FFT.For example, in Matlab, y = Ac is implemented as:

function y = A(c, M, N)

v = N * ifft(c);

y = v(1:M);

end

Similarly, A∗ y can be obtained by zero-padding and computing the DFT.In Matlab, c = A∗ y is implemented as:

function c = AT(y, M, N)

c = fft([y; zeros(N-M, 1)]);

end

⇒ Matrix-free algorithms.

3. Due to the orthogonality properties of complex sinusoids,

AA∗ = N IM (16)

10

Example: Sparse Fourier coefficients using BP

When N = M, the coefficients c satisfying y = Ac are uniquely determined.

When N > M, the coefficients c are not unique. Any vector c satisfyingy = Ac can be considered a valid set of coefficients. To find a particularsolution we can minimize either ‖c‖22 or ‖c‖1.

Least squares:

argminc‖c‖22 (17a)

such that y = Ac (17b)

Basis pursuit:

argminc‖c‖1 (18a)

such that y = Ac. (18b)

The two solutions can be quite different...

11

Example: Sparse Fourier coefficients using BP

0 20 40 60 80 100

−1

−0.5

0

0.5

1

1.5

Time (samples)

Signal

Real part

Imaginary part

Least square solution:

c = A∗(AA∗)−1 y =1

NA∗y (least square solution)

A∗y is computed via:

1. zero-pad the length-M signal y to length-N

2. compute its DFT

BP solution: Compute using algorithm SALSA

12

Example: Sparse Fourier coefficients using BP

0 20 40 60 80 1000

20

40

60

80

Frequency (DFT index)

(A) Fourier coefficients (DFT)

0 50 100 150 200 2500

0.1

0.2

0.3

0.4(B) Fourier coefficients (least square solution)

Frequency (index)

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1(C) Fourier coefficients (basis pursuit solution)

Frequency (index)

The BP solution does not exhibit the leakage phenomenon.13

Example: Sparse Fourier coefficients using BP

0 20 40 60 80 100

1

1.5

2

2.5

Cost function history

Iteration

Cost function history of algorithm for basis pursuit solution

14

Example: Denoising using BPD

Digital LTI filters are often used for noise reduction (denoising).

But if

I the noise and signal overlap in the frequency domainor

I the respective frequency bands are unknown,

then it is difficult to use LTI filters.

However, if the signal has sparse (or relatively sparse) Fourier coefficients, thenBPD can be used for noise reduction.

15

Example: Denoising using BPD

Noisy speech signal y(m):

y(m) = s(m) + w(m), 0 ≤ m ≤ M − 1, M = 500 (19)

s(m) : noise-free speech signalw(m) : noise sequence.

0 100 200 300 400 500

−0.4

−0.2

0

0.2

0.4

0.6 Noisy signal

Time (samples)

0 100 200 300 400 5000

0.01

0.02

0.03

0.04(A) Fourier coefficients (FFT) of noisy signal

Frequency (index)

16

Example: Denoising using BPD

Assume the noise-free speech signal s(n) has a sparse set of Fourier coefficients:

y = Ac + w

y : noisy speech signal, length-MA : M × N DFT matrix (15)c : sparse Fourier coefficients, length-Nw : noise, length-M

As y is noisy, find c by solving the least square problem

argminc‖y − Ac‖22 + λ‖c‖22 (20)

or the basis pursuit denoising (BPD) problem

argminc‖y − Ac‖22 + λ‖c‖1. (21)

Once c is found, an estimate of the speech signal is given by s = Ac.

17

Example: Denoising using BPD

Least square solution:

c = (A∗A + λI)−1A∗ y =1

λ+ NA∗ y (AA∗ = N I) (22)

using (12) and (16).⇒ least square estimate of the speech signal is

s = Ac =N

λ+ Ny (least square solution).

But this is only a scaled version of the noisy signal!

No real filtering is achieved.

18

Example: Denoising using BPD

BPD solution:

0 100 200 300 400 5000

0.02

0.04

0.06(B) Fourier coefficients (BPD solution)

Frequency (index)

0 100 200 300 400 500

−0.4

−0.2

0

0.2

0.4

0.6 Denoising using BPD

Time (samples)

obtained with algorithm SALSA. Effective noise reduction, unlike least squares!

19

Example: Deconvolution using BPD

If the signal of interest x(m) is not only noisy but is also distorted by an LTIsystem with impulse response h(m), then the available data y(m) is

y(m) = (h ∗ x)(m) + w(m) (23)

where ‘∗’ denotes convolution (linear convolution) and w(m) is additive noise.Given the observed data y , we aim to estimate the signal x . We will assumethat the sequence h is known.

20

Example: Deconvolution using BPD

y = Hx + w (24)

x : length Nh : length Ly : length M = N + L− 1

H =

h0h1 h0h2 h1 h0

h2 h1 h0h2 h1

h2

(25)

H is of size M × N with M > N (because M = N + L− 1).

21

Example: Deconvolution using BPD

Sparse signal convolved by the 4-point moving average filter

h(n) =

{14

n = 0, 1, 2, 3

0 otherwise

0 20 40 60 80 100

−1

0

1

2 Sparse signal

0 20 40 60 80 100−0.4

−0.2

0

0.2

0.4

0.6Observed signal

22

Example: Deconvolution using BPD

Due to noise w, solve the least square problem

argminx‖y −Hx‖22 + λ‖x‖22 (26)

or the basis pursuit denoising (BPD) problem

argminx‖y −Hx‖22 + λ‖x‖1 (27)

Least square solution:x = (H∗H + λI)−1H∗ y. (28)

23

Example: Deconvolution using BPD

0 20 40 60 80 100−0.4

−0.2

0

0.2

0.4

0.6Deconvolution (least square solution)

0 20 40 60 80 100

−1

0

1

2 Deconvolution (BPD solution)

The BPD solution, obtained using SALSA, is more faithful to original signal.

24

Example: Filling in missing samples using BP

Due to data transmission/acquisition errors, some signal samples may be lost.Fill in missing values for error concealment.

Part of a signal or image may be intentionally deleted (image editing, etc).Convincingly fill in missing values according to the surrounding area to doinpainting.

0 100 200 300 400 500

−0.5

0

0.5 Incomplete signal

Time (samples)

200 missing samples

25

Example: Filling in missing samples using BP

The incomplete signal y:y = Sx (29)

x : length My : length K < MS : ‘selection’ (or ‘sampling’) matrix of size K ×M.

For example, if only the first, second and last elements of a 5-point signal x areobserved, then the matrix S is given by:

S =

1 0 0 0 00 1 0 0 00 0 0 0 1

. (30)

Problem: Given y and S, find x such that y = Sx

⇒ Underdetermined system, infinitely many solutions.

Least square and BP solutions are very different...

26

Example: Filling in missing samples using BP

Properties of S:

1.SSt = I (31)

where I is an K × K identity matrix. For example, with S in (30)

SSt =

1 0 00 1 00 0 1

.2. Sty sets the missing samples to zero.

For example, with S in (30)

Sty =

1 0 00 1 00 0 00 0 00 0 1

y(0)y(1)y(2)

=

y(0)y(1)

00

y(2)

. (32)

27

Example: Filling in missing samples using BP

Suppose x has a sparse representation with respect to A,

x = Ac (33)

c : sparse vector, length N, with M ≤ NA : size M × N.

The incomplete signal y can then be written as

y = Sx from (29) (34a)

= SAc from (33). (34b)

Therefore, if we can find a vector c satisfying

y = SAc (35)

then we can create an estimate x of x by setting

x = Ac. (36)

From (35), Sx = y. That is, x agrees with the known samples y.

28

Example: Filling in missing samples using BP

Note that y is shorter than the coefficient vector c, so there are infinitely manysolutions to (35).

Any vector c satisfying y = SAc can be considered a valid set of coefficients.

To find a particular solution, solve the least square problem

argminc‖c‖22 (37a)

such that y = SAc (37b)

or the basis pursuit problem

argminc‖c‖1 (38a)

such that y = SAc. (38b)

The two solutions are very different...

Let us assume A satisfiesAA∗ = pI, (39)

for some positive real number p.

29

Example: Filling in missing samples using BP

The least square solution is

c = (SA)∗((SA)(SA)∗)−1 y (40)

= A∗St(SAA∗St)−1 y (41)

= A∗St(p SSt)−1 y using (10) (42)

= A∗St(p I)−1 y using (31) (43)

=1

pA∗St y (44)

Hence, the least square estimate x is given by

x = Ac (45)

=1

pAA∗St y using (44) (46)

= St y using (10). (47)

This estimate consists of setting all the missing values to zero! See (32).

No real estimation of the missing values has been achieved.

The least square solution is of no use here.

30

Example: Filling in missing samples using BP

Short segments of speech can be sparsely represented using the DFT; thereforewe set A equal to the M × N DFT (15) with N = 1024.

BP solution obtained using 100 iterations of a SALSA:

0 100 200 300 400 5000

0.02

0.04

0.06

0.08Estimated coefficients

Frequency (DFT index)

0 100 200 300 400 500

−0.5

0

0.5 Estimated signal

Time (samples)

The missing samples have been filled in quite accurately.

31

Total Variation Denoising/Deconvolution

A signal is observed in additive noise,

y = x + w.

Total variation denoising is suitable when it is known that the signal of interesthas a sparse derivative.

argminx‖y − x‖22 + λ

N∑n=2

|x(n)− x(n − 1)|. (48)

The total variation of x is defined as

TV(x) :=N∑

n=2

|x(n)− x(n − 1)| = ‖Dx‖1

D =

−1 1

−1 1. . .

−1 1

(49)

Matrix D has size (N − 1)× N.

32

Total Variation Denoising/Deconvolution

0 50 100 150 200 250

−2

0

2

4

6Noise−free signal

0 50 100 150 200 250

−2

0

2

4

6Noisy signal

0 50 100 150 200 250

−2

0

2

4

6

L2−filtered signal (λ = 3.00)

0 50 100 150 200 250

−2

0

2

4

6

L2−filtered signal (λ = 8.00)

0 50 100 150 200 250

−2

0

2

4

6

TV−filtered signal (λ = 3.00)

TV filtering preserves edges/jumps more accurately than least squares filtering.

33

Polynomial Smoothing of Time Series with Additive Step Discontinuities

0 50 100 150 200 250 300

−1

0

1

2

3 (a) Noisy data

0 50 100 150 200 250 300

−1

0

1

2

3 (b) Calculated TV component

0 50 100 150 200 250 300

−1

0

1

2

3 (c) TV−compensated data

0 50 100 150 200 250 300

−1

0

1

2

3 (d) Estimated signal

Observed noisy data: a low-frequency signal and with step-wise discontinuities(steps/jumps).

Goal: Simultaneous smoothing and change/jump detection.

Conventional low-pass filtering blurs discontinuities.

34

Polynomial Smoothing of Time Series with Additive Step Discontinuities

Variational Formulation: Given noisy data y, find

x (step signal)

p (local polynomial signal)

by minimization:

argmina,x

λ

N∑n=2

|x(n)− x(n − 1)|+N∑

n=1

|y(n)− p(n)− x(n)|2

wherep(n) = a1 n + · · ·+ ad n

d .

The cost function is non-differentiable. We use ‘variable splitting’ and ADMMto derive an efficient iterative algorithm.

35

Polynomial Smoothing of Time Series with Additive Step Discontinuities

Nano-particle Detection: The Whispering Gallery Mode (WGM) biosensor,

designed to detect individual nano-particles in real time, is based on detecting discrete

changes in the resonance frequency of a microspherical cavity. The adsorption of a

nano-particle onto the surface of a custom designed microsphere produces a

theoretically predicted but minute change in the optical properties of the sphere

[Arnold, 2003].

10 20 30 40 50 60 70 80 900

10

20

30

40

50

60

70

80

90

Time (seconds)

wa

ve

len

gth

(fm

)

36

Polynomial Smoothing of Time Series with Additive Step Discontinuities

Nano-particle Detection:

0 50 100 150 200 250 3000

50

100

150

200

250

300

Time (seconds)

wa

ve

len

th (

fm)

0 50 100 150 200 250 300−30

−20

−10

0

10

20x(t)

wa

ve

len

gth

(fm

)

Time (seconds)

0 50 100 150 200 250 300−5

0

5

10

15diff(x(t))

Time (seconds)w

ave

len

gth

(fm

)

Reference: Polynomial Smoothing of Time Series with Additive Step Discontinuities, IS, S. Arnold,

V. Dantham. Submitted to IEEE Trans on Signal Processing.

37

Summary

1. Basis pursuit

2. Basis pursuit denoising

3. Sparse Fourier coefficients using BP

4. Denoising using BPD

5. Deconvolution using BPD

6. Filling in missing samples using BP

7. Total variation filtering

8. Simultaneous least-squares & sparsity

38

top related