lecture iv: a bayesian viewpoint on sparse models

71
Lecture IV: A Bayesian Viewpoint on Sparse Models Yi Ma John Wri osoft Research Asia Columbia University (Slides courtesy of David Wipf, MSRA) IPAM Computer Vision Summer School, 20

Upload: zivanka-rumer

Post on 04-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Lecture IV: A Bayesian Viewpoint on Sparse Models. Yi Ma John Wright Microsoft Research Asia Columbia University (Slides courtesy of David Wipf, MSRA). IPAM Computer Vision Summer School, 2013. Convex Approach to Sparse Inverse Problems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Lecture IV:A Bayesian Viewpoint on Sparse

Models

Yi Ma John WrightMicrosoft Research Asia Columbia University

(Slides courtesy of David Wipf, MSRA)

IPAM Computer Vision Summer School, 2013

Page 2: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Convex Approach to Sparse Inverse Problems

1. Ideal (noiseless) case:

2. Convex relaxation (lasso):

¨ Note: These may need to be solved in isolation, or embedded in a larger system depending on the application

1

2

2min xxy

x

. , s.t. min0

mnR xyxx

Page 3: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

When Might This Strategy Be Inadequate?

Two representative cases:

1. The dictionary F has coherent columns.

2. There are additional parameters to estimate, potentially embedded in F.

The ℓ1 penalty favors both sparse and low-variance

solutions. In general, the cause of ℓ1 failure is always that the later influence can sometimes dominate.

Page 4: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dictionary Correlation Structure

T T

Examples:

Unstructured

Example:

Structured

( ) ( ) A Bstr unstr

arbitrary blockdiagonal

( ) iid (0,1) entries unstr N

( ) random rows of DFT unstr

Page 5: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Block Diagonal Example

¨ The ℓ1 solution typically selects either zero or one basis vector from each cluster of correlated columns.

¨ While the ‘cluster support’ may be partially correct, the chosen basis vectors likely will not be.

( ) ( ) Bstr unstr

blockdiagonal

( ) ( ) Tstr str

Problem:

Page 6: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dictionaries with Correlation Structures

¨ Most theory applies to unstructured incoherent cases, but many (most?) practical dictionaries have significant coherent structures.

¨ Examples:

Page 7: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

MEG/EEG Example

?

F

source space (x) sensor space (y)

¨ Forward model dictionary F can be computed using Maxwell’s equations [Sarvas,1987].

¨ Will be dependent on location of sensors, but always highly structured by physical constraints.

Page 8: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

MEG Source Reconstruction Example

Ground Truth Group Lasso Bayesian Method

Page 9: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Bayesian Formulation¨ Assumptions on the distributions:

¨ This leads to the MAP estimate:

. ,0 ; i.e. ,||2

1exp)|(

prior sparse general a ,2

1exp)(

INp

gxgpi

i

xy||x-yxy

x

22

|)log(|)( e.g. )( ||1

min

).()|(maxarg )|(maxarg*

iii

i xxgxg

ppp

22

x||x-y

xxyyxx

Page 10: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Latent Variable Bayesian Formulation

Sparse priors can be specified via a variational form in terms of maximizing scaled Gaussians:

where or are latent variables.

is a positive function, which can be chose to define any sparse priors (e.g. Laplacian, Jeffreys, generalized Gaussians etc.) [Palmer et al., 2006].

iiii

iiii

ii

xNp

xNxpxppi

)(),0;()(

)(),0;( max)( ),()(0

x

x

Page 11: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Posterior for a Gaussian Mixture

For a fixed , with the prior:

the posterior is a Gaussian distribution:

The “optimal estimate” for x would simply be the mean

but this is obviously not optimal…

,)(),0;()( i

iiixNp x

.)I(

,)I(

),;(N~)(p)|(p)|(p

1TT

1TTx

xx

x

y

xxxyyx

.)I( 1TTx yx

Page 12: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Approximation via Marginalization

iiiixNp

ppp

i)(),0;(max)|(maxarg

).(max)|(maxarg)|(maxarg*

xy

xxyyxx

x

xx

We want to approximate

. fixed somefor )()|( )(max)|( **

xxyxxy pppp

)]()()[|(minarg

)(),0;()|(maxarg*

xxxxy

xy

dppp

dxxNp ii

iii

Find that maximizes the expected value with respect to x:

Page 13: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Latent Variable Solution

).(log2||logminarg

)(),0;()|(log2minarg

)(),0;()|(maxarg

1

*

ii

T

ii

iii

ii

iii

dxxNp

dxxNp

yy yy

xy

xy

.TI ywith

,||||1

min 11 xxxyyy 22xy

TT

.)( 1* yx TT I

Page 14: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

MAP-like Regularization

)(

)(2

1

,

**

)(log2||logmin||||1

minarg

)(log2||log||||1

minarg),(

x

y22

x

y22

x

xy

xxxyx

g

i

f

iii

i

ii

T

i

i

x

Very often, for simplicity, we often choose

Notice that g(x) is in general not separable:

.)( )(logmin)(2

i

ii

iT

i

i xgfIx

gi

x

.constant) (a )( bf i

Page 15: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Theorem. When is a concave, nondecreasingfunction of |x|. Also, any local solution x* has at most n nonzeros.

)g( ,)( xbf i

Theorem. When the program has no local minima. Furthermore, g(x) becomes separable and has the closed form

which is a non-descreasing strictly concave function on

, ,)( Ibf Ti

4||2log4||

||2)()( 22

2

iii

i i ii

ii xxx

xx

xxgg x

.|| ix

Tipping, 2001; Wipf and Nagarajan, 2008[ ]

Properties of the Regularizer

Page 16: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Smoothing Effect: 1D Feasible Region

( )

0.01

0

II

ii

g

x x

x

pen

alt

y va

lue

0

ull

where is a scalar

= maximally sparse solution

N

v

x0 x x v

xg

0 xy

Page 17: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Noise-Aware Sparse Regularization

1

0

)( ,

|)log(|)( ,0

x

x

ii

ii

ii

xg

xxg

Page 18: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Philosophy

¨ Literal Bayesian: Assume some prior distribution on unknown parameters and then justify a particular approach based only on the validity of these priors.

¨ Practical Bayesian: Invoke Bayesian methodology to arrive at potentially useful cost functions. Then validate these cost functions with independent analysis.

Page 19: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

¨ Candidate sparsity penalty:

primal dual

Aggregate Penalty Functions

i

T

i

idual I

xg )diag(logmin)(

2

x )||diag(log)( Tprimal Ig xx

|)log(|)( i

iprimal xg x

ii

i

idual

xg )log(min)(

2

0

x

Tipping, 2001; Wipf and Nagarajan, 2008[ ]

NOTE: If l → 0, both penalties have same minimum as ℓ0 norm

If l → , both converge to scaled versions of the ℓ1 norm.

Page 20: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

How Might This Philosophy Help?

¨ Consider reweighted ℓ1 updates using primal-space penalty

(1)

1(1) (1) diagprimal T Ti i i

i

gw I

x

x x

xx

(1) arg min s.t. iix

xx y x

Initial ℓ1 iteration with w(0) = 1:

Weight update:

Reflects the subspace of all active columns*and* any columns of F that are nearby

Correlated columns will produce similar weights, small if in the active subspace, large otherwise.

Page 21: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Basic Idea

¨ Initial iteration(s) locate appropriate groups of correlated basis vectors and prune irrelevant clusters.

¨ Once support is sufficiently narrowed down, then regular ℓ1 is sufficient.

¨ Reweighed ℓ1 iterations naturally handle this transition.

¨ The dual-space penalty accomplishes something similar and has additional theoretical benefits …

Page 22: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Alternative ApproachWhat about designing an ℓ1 reweighting function

directly?

¨ Iterate:

¨ Note: If f satisfies relatively mild properties there will exist an associated sparsity penalty that is being minimized.

( 1) ( )arg min s.t. k ki ii

w x x

x y x

( 1) ( 1) k kf w x

Can select f without regard to a specific penalty function

Page 23: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

¨ Implicit penalty function can be expressed in integral form for certain selections for p and q.

¨ For the right choice of p and q, has some guarantees for clustered dictionaries …

Example f(p,q)

1( 1) ( 1) diag

qp

k T k Ti i iw I

x

, 0p q

Page 24: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

(unstr) (str) (unstr)N(0,1), D

Convenient optimization via reweighted ℓ1 minimization [Candes 2008]

Provable performance gains in certain situations [Wipf

2013]

Toy Example: Generate 50-by-100

dictionaries:

Generate a sparse x

Estimate x from observations

Numerical Simulations

bayesian, F(unstr)

bayesian, F(str)

standard, F(unstr)

standard, F(str)

0x

succ

ess

rate

(unstr) (unstr) (str ) (str ) , y x y x

B

Page 25: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Summary

¨ In practical situations, dictionaries are often highly structured.

¨ But standard sparse estimation algorithms may be inadequate in this situation (existing performance guarantees do not generally apply).

¨ We have suggested a general framework that compensates for dictionary structure via dictionary-dependent penalty functions.

¨ Could lead to new families of sparse estimation algorithms.

Page 26: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dictionary Has Embedded Parameters

1. Ideal (noiseless) :

2. Relaxed version:

¨ Applications: Bilinear models, blind deconvolution, blind image deblurring, etc.

1

2

2,min xxky

kx

xkyxkx

s.t. min0,

Page 27: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Blurry Image Formation

¨ Relative movement between camera and scene during exposure causes blurring:

single blurrymulti-blurryblurry-noisy

[Whyte et al., 2011]

Page 28: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Blurry Image Formation¨ Basic observation model (can be generalized):

blurryimage

blur kernel

sharpimage

noise

Page 29: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Blurry Image Formation¨ Basic observation model (can be generalized):

blurryimage

blur kernel

sharpimage

noise

√ ? ?Unknown quantities we would like to estimate

Page 30: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Gradients of Natural Images are Sparse

Hence we work in gradient domain

: vectorized derivatives of the sharp image: vectorized derivatives of the blurry image

Page 31: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Blind Deconvolution

¨ Observation model:

¨ Would like to estimate the unknown x blindly since k is also unknown.

¨ Will assume unknown x is sparse.

nxknxky convolution

operatortoeplitz matrix

Page 32: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Attempt via Convex Relaxation

Solve:

Problem:

¨ So the degenerate, non-deblurred solution is favored:

xkyxkx

s.t. min1, k

ikk ii

ik ,0 ,1 : k

xk, feasible

I kk ,

11

11

xxxy tt

tt

tt kk

translated image superimposed

Page 33: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Bayesian Inference

¨ Assume priors p(x) and p(k) and likelihood p(y|x,k).

¨ Compute the posterior distribution via Bayes Rule:

¨ Then infer x and or k using estimators derived from p(x,k|y), e.g., the posterior means, or marginalized means.

y

kxkxyykx

p

pppp

,||,

Page 34: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Bayesian Inference: MAP Estimation

¨ Assumptions:

¨ Solve:

¨ This is just regularized regression with a sparse penalty that reflects natural image statistics.

iixg

ppp

k

kk

2

2,

,,

1minarg

)(log),|(logminarg)|,(maxarg

xky

xkxyykx

kx

kxkx

INp

p

gxgp

k

ii

0, ;:),|(

)0 1,||||(say set over uniform:)(

images natural from estimated ,2

1exp:)(

1

xkykxy

kkk

x

Page 35: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Failure of Natural Image Statistics¨ Shown in red are 15 X 15 patches where

(Standardized) natural image gradient statistics suggest

xky with i

p

ii

p

i yx

8.0,5.0p

p

xxp2

1exp

[Simoncelli, 1999]

Page 36: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

The Crux of the Problem

¨ MAP only considers the mode, not the entire location of prominent posterior mass.

¨ Blurry images are closer to the origin in image gradient space; they have higher probability but lie in a restricted region of relatively low overall mass which ignores the heavy tails.

Natural image statistics are not the best choice with MAP, they favor blurry images more than sharp ones!

feasible set

sharp: sparse, high variance

blurry: non-sparse, low variance

Page 37: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

¨ Rather than accurately reflecting natural image statistics, for MAP to work we need a prior/penalty such that

¨ Theoretically ideal … but now we have a combinatorial optimization problem, and the convex relaxation provably fails.

Lemma: Under very mild conditions, the ℓ0 norm (invariant to changes in variance) satisfies:

with equality iff k = d. (Similar concept holds when x is not exactly sparse.)

An “Ideal” Deblurring Cost Function

pairs blurry , sharp yxi

ii

i ygxg

00xkx

Page 38: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Local Minima Example

¨ 1D signal is convolved with a 1D rectangular kernel

¨ MAP estimation using ℓ0 norm implemented with IRLS minimization technique.

Provable failure because of convergence to local minima

Page 39: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Motivation for Alternative Estimators

¨ With the ℓ0 norm we get stuck in local minima.

¨ With natural image statistics (or the ℓ1 norm) we favor the degenerate, blurry solution.

¨ But perhaps natural image statistics can still be valuable if we use an estimator that is sensitive to the entire posterior distribution (not just its mode).

Page 40: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Latent Variable Bayesian Formulation

¨ Assumptions:

¨ Follow the same process as the general case, we have:

INp

p

fxNxpxpp

k

iiii

iii

0, ;:),|(

)0 1,||||(say set over uniform:)(

)(2

1exp) ,0 ;( max)( with ),(:)(

1

0

xkykxy

kkk

x

),,(

22

0,)()||||log(min||||

1min

kx

222

kxkxky

VB

i

g

iii

i

i fx

Page 41: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

¨ Choosing p(x) is equivalent to choosing function f embedded in gVB.

¨ Natural image statistics seem like the obvious choice [Fergus et al., 2006; Levin et al., 2009].

¨ Let fnat denote the f function associated with such a prior (it can be computed using tools from convex analysis [Palmer et al., 2006]).

¨ So the implicit VB image penalty actually favors the blur

solution even more than the original natural image statistics!

(Di)Lemma:

is less concave in |x| than the original image prior [Wipf and

Zhang, 2013].

Choosing an Image Prior to Use

i

iii

iVB f

xg

i

nat

2

2

2

0loginf,, kkx

Page 42: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Practical Strategy¨ Analyze the reformulated cost function independently of its

Bayesian origins.

¨ The best prior (or equivalently f ) can then be selected based on properties directly beneficial to deblurring.

¨ This is just like the Lasso: We do not use such an ℓ1 model because we believe the data actually come from a Laplacian distribution.

Theorem. When has the closed form

with

),,( ,)( kxVBi gbf

4||2log4||

||2)(),( 22

2

iii

i i ii

ii xxx

xx

xxgg x

22k ||||

Page 43: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Sparsity-Promoting Properties

If and only if f is constant, then gVB satisfies the following:

¨ Sparsity: Jointly concave, non-decreasing function of |xi| for all i.

¨ Scale-invariance: Constraint set Wk on k does not affect solution.

¨ Limiting cases:

¨ General case:

12

2

02

2

of verion caled ,, then If

of verion caled ,,n the0 If

xkxk

xkxk

sg

sg

VB

VB

bbVBaaVB

b

b

a

a gg ,, torelative concave is ,, then If 2

2

2

2

kxkxkk

[Wipf and Zhang, 2013]

Page 44: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Why Does This Help?

¨ gVB is a scale-invariant sparsity penalty that interpolates between the ℓ1 and ℓ0 norms

¨ More concave (sparse) if¨ l is small (low noise, modeling error)¨ k norm is big (meaning the kernel is sparse)¨ These are the easy cases

¨ Less concave if¨ l is big (large noise or kernel errors near the beginning of

estimation)¨ k norm is small (kernel is diffuse, before fine scale details are

resolved)

0 1 2 3 4 50

1

2

1.5

2

2.5

z

pen

alty

val

ue

Relative Sparsity Curve

1=0.012=1

This shape modulation allows VB to avoid local minima initially while automatically introducing additional non-convexity to resolve fine details as estimation progresses.

Page 45: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Local Minima Example Revisited

¨ 1D signal is convolved with a 1D rectangular kernel

¨ MAP using ℓ0 norm versus VB with adaptive shape

Page 46: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Remarks¨ The original Bayesian model, with f constant, results

from the image prior (Jeffreys prior)

¨ This prior does not resemble natural images statistics at all!

i

i xxp

1

¨ Ultimately, the type of estimator may completely determine which prior should be chosen.

¨ Thus we cannot use the true statistics to justify the validity of our model.

Page 47: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Variational Bayesian Approach

¨ Instead of MAP:

¨ Solve

¨ Here we are first averaging over all possible sharp images, and natural image statistics now play a vital role

)|,(pmaxk,

ykxkx

xykxykkk

dppkk

)|,(max )|(max

Lemma: Under mild conditions, in the limit of large images, maximizing p(k|y) will recover the true blur kernel k if p(x) reflects the true statistics.

[Levin et al., 2011]

Page 48: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Approximate Inference

¨ The integral required for computing p(k|y) is intractable.

¨ Variational Bayes (VB) provides a convenient family of upper bounds for maximizing p(k|y) approximately.

¨ Technique can be applied whenever p(x) is expressible in a particular variational form.

Page 49: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Maximizing Free Energy Bound¨ Assume p(k) is flat within constraint set, so we

want to solve:

¨ Useful bound [Bishop 2006]:

¨ Minimization strategy (equivalent to EM algorithm):

¨ Unfortunately, updates are still not tractable.

γx

γx

kyγxγxγxkky dd

q

pqqp

,

|,,log, ,,F |log

ykγxγx ,|, , pq with equality iff

,,F max

, ,γxk

γxkq

qk

)|(max kyk

pk

Page 50: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Practical Algorithm¨ New looser bound:

¨ Iteratively solve:

¨ Efficient, closed-form updates are now possible because the factorization decouples intractable terms.

γx

kyγxγxkky dd

qxq

pqxqqp

ii

ii

ii

|,,log ,,F |log

i

ii

qqxqqq γxγxk

γxk, s.t. ,,F max

,,

[Palmer et al., 2006; Levin et al., 2011]

Page 51: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Questions¨ The above VB has been motivated as a way of

approximating the marginal likelihood p(y|k).

¨ However, several things remain unclear:

¨ What is the nature of this approximation, and how good is it?

¨ Are natural image statistics a good choice for p(x) when using VB?

¨ How is the underlying cost function intrinsically different from MAP?

¨ A reformulation of VB can help here …

Page 52: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

EquivalenceSolving the VB problem

is equivalent to solving the MAP-like problem

,,min2

2,kxxky

kxVBg

k

i

iii

iVB f

xg

i

2

2

2

0loginf,, kkx

i

ii

qqxqqq

k

, s.t. ,,F max

, ,γxγxk

γxk

where

[Wipf and Zhang, 2013]

function that depends only on p(x)

Page 53: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Remarks

¨ VB (via averaging out x) looks just like standard penalized regression (MAP), but with a non-standard image penalty gVB whose shape is dependent on both the noise variance lambda and the kernel norm.

¨ Ultimately, it is this unique dependency which contributes to VB’s success.

Page 54: Lecture IV: A  Bayesian Viewpoint on  Sparse Models
Page 55: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Blind Deblurring Results

Levin et al. dataset [CVPR, 2009]¨ 4 images of size 255 × 255 and 8 different empirically

measured ground-truth blur kernels, giving in total 32 blurry images

x1 x2x4x3

K1-K4

K5-K8

Imag

es

Blu

r K

ern

els

Page 56: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Comparison of VB Methods

Note: VB-Levin and VB-Fergus are based on natural image statistics [Levin et al., 2011; Fergus et al., 2006]; VB-Jeffreys is based on the theoretically motivated image prior.

Page 57: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Comparison with MAP Methods

Note: MAP methods [Shan et al., 2008; Cho and Lee, 2009; Xu and Jia, 2010] rely on carefully-defined structure selection heuristics to local salient edges, etc., to avoid the no-blur (delta) solution. VB requires no such added complexity.

Page 58: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

ExtensionsCan easily adapt the VB model to more general scenarios:

1. Non-uniform convolution models

2. Multiple images for simultaneous denoising and deblurring

Blurry image is a superposition of translated and rotated sharp images

Blurry Noisy

[Yuan, et al., SIGGRAPH, 2007]

Page 59: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Non-Uniform Real-World Deblurring

Blurry Whyte et al. Zhang and Wipf

O. Whyte et al. , Non-uniform deblurring for shaken images, CVPR, 2010.

Page 60: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Non-Uniform Real-World Deblurring

Blurry Gupta et al. Zhang and Wipf

S. Hirsch et al. , Single image deblurring using motion density functions, ECCV, 2010.

Page 61: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Non-Uniform Real-World Deblurring

Blurry Joshi et al. Zhang and Wipf

N. Joshi et al. , Image deblurring using inertial measurement sensors, SIGGRAPH, 2010.

Page 62: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Non-Uniform Real-World Deblurring

Blurry Hirsch et al. Zhang and Wipf

S. Hirsch et al. , Fast removal of non-uniform camera shake, ICCV, 2011.

Page 63: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

Test images from: J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring usingmultiple images. J. Comput. Physics, 228(14):5057–5071, 2009.

Blurry I

Page 64: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

64

Test images from: J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring usingmultiple images. J. Comput. Physics, 228(14):5057–5071, 2009.

Blurry II

Page 65: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring using multiple images. J. Comput. Physics, 228(14):5057–5071, 2009.

Cai et al.

Page 66: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

F.Sroubek and P. Milanfar. Robust multichannel blind deconvolution via fast alternating minimization. IEEE Trans. on Image Processing, 21(4):1687–1700, 2012.

Sroubek et al.

Page 67: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

Zhang et al.

H. Zhang, D.P. Wipf and Y. Zhang, Multi-Image Blind Deblurring Using a Coupled Adaptive Sparse Prior, CVPR, 2013.

Page 68: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

Zhang et al.Cai et al. Sroubek et al.

Page 69: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Dual Motion Blind Deblurring Real-world Image

Zhang et al.Cai et al. Sroubek et al.

Page 70: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Take-away Messages¨ In a wide range of applications, convex

relaxations are extremely effective and efficient.

¨ However, there remain interesting cases where non-convexity still plays a critical role.

¨ Bayesian methodology provides one source of inspiration for useful non-convex algorithms.

¨ These algorithms can then often be independently justified without reliance on the original Bayesian statistical assumptions.

Page 71: Lecture IV: A  Bayesian Viewpoint on  Sparse Models

Thank you, questions?

• D. Wipf and H. Zhang, “Revisiting Bayesian Blind Deconvolution,” arXiv:1305.2362, 2013.

• D. Wipf, “Sparse Estimation Algorithms that Compensate for Coherent Dictionaries,” MSRA Tech Report, 2013.

• D. Wipf, B. Rao, S. Nagarajan, “Latent Variable Bayesian Models for Promoting Sparsity,” IEEE Trans. Info Theory, 2011.

• A. Levin, Y. Weiss, F. Durand, and W.T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” Computer Vision and Pattern Recognition (CVPR), 2009.

References