talk icml

34
Online Dictionary Learning for Sparse Coding Julien Mairal 1 Francis Bach 1 Jean Ponce 2 Guillermo Sapiro 3 1 INRIA–Willow project 2 Ecole Normale Supérieure 3 University of Minnesota ICML, Montréal, June 2008

Upload: bo-li

Post on 30-May-2015

193 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Talk icml

Online Dictionary Learning forSparse Coding

Julien Mairal1 Francis Bach1Jean Ponce2 Guillermo Sapiro3

1INRIA–Willow project 2Ecole Normale Supérieure3University of Minnesota

ICML, Montréal, June 2008

Page 2: Talk icml

What this talk is aboutLearning efficiently dictionaries (basis set) for sparsecoding.

Solving a large-scale matrix factorization problem.

Making some large-scale image processing problemstractable.

Proposing an algorithm which extends to NMF, sparsePCA,. . .

Page 3: Talk icml

1 The Dictionary Learning Problem

2 Online Dictionary Learning

3 Extensions

Page 4: Talk icml

1 The Dictionary Learning Problem

2 Online Dictionary Learning

3 Extensions

Page 5: Talk icml

The Dictionary Learning Problem

y︸︷︷︸measurements

= xorig︸ ︷︷ ︸original image

+ w︸︷︷︸noise

Page 6: Talk icml

The Dictionary Learning Problem[Elad & Aharon (’06)]

Solving the denoising problemExtract all overlapping 8× 8 patches xi .

Solve a matrix factorization problem:

minαi ,D∈C

n∑i=1

12||xi −Dαi ||22︸ ︷︷ ︸

reconstruction

+λ||αi ||1︸ ︷︷ ︸sparsity

,

with n > 100, 000

Average the reconstruction of each patch.

Page 7: Talk icml

The Dictionary Learning Problem[Mairal, Bach, Ponce, Sapiro & Zisserman (’09)]

Denoising result

Page 8: Talk icml

The Dictionary Learning Problem[Mairal, Sapiro & Elad (’08)]

Image completion example

Page 9: Talk icml

The Dictionary Learning ProblemWhat does D look like?

Page 10: Talk icml

The Dictionary Learning Problem

minα∈Rk×n

D∈C

n∑i=1

12 ||xi −Dαi ||22 + λ||αi ||1

C M= {D ∈ Rm×k s.t. ∀j = 1, . . . , k , ||dj ||2 ≤ 1}.

Classical optimization alternates between Dand α.

Good results, but very slow!

Page 11: Talk icml

1 The Dictionary Learning Problem

2 Online Dictionary Learning

3 Extensions

Page 12: Talk icml

Online Dictionary Learning

Classical formulation of dictionary learning

minD∈C

fn(D) = minD∈C

1n

n∑i=1

l(xi ,D),

where

l(x,D) M= minα∈Rk

12||x−Dα||22 + λ||α||1.

Page 13: Talk icml

Online Dictionary Learning

Which formulation are we interested in?

minD∈C

[f (D) M= Ex [l(x,D)] ≈ lim

n→+∞

1n

n∑i=1

l(xi ,D)]

Page 14: Talk icml

Online Dictionary Learning

Online learning canhandle potentially infinite datasets,adapt to dynamic training sets,be dramatically faster than batchalgorithms [Bottou & Bousquet (’08)].

Page 15: Talk icml

Online Dictionary LearningProposed approach

1: for t=1,. . . ,T do2: Draw xt3: Sparse Coding

αt ← argminα∈Rk

12||xt −Dt−1α||22 + λ||α||1,

4: Dictionary Learning

Dt ← argminD∈C

1t

t∑i=1

(12||xi−Dαi ||22+λ||αi ||1

),

5: end for

Page 16: Talk icml

Online Dictionary LearningProposed approach

Implementation detailsUse LARS for the sparse coding step,

Use a block-coordinate approach for thedictionary update, with warm restart,

Use a mini-batch.

Page 17: Talk icml

Online Dictionary LearningProposed approach

Which guarantees do we have?Under a few reasonable assumptions,

we build a surrogate function f̂t of theexpected cost f verifying

limt→+∞

f̂t(Dt)− f (Dt) = 0,

Dt is asymptotically close to a stationarypoint.

Page 18: Talk icml

Online Dictionary LearningExperimental results, batch vs online

m = 8× 8, k = 256

Page 19: Talk icml

Online Dictionary LearningExperimental results, batch vs online

m = 12× 12× 3, k = 512

Page 20: Talk icml

Online Dictionary LearningExperimental results, batch vs online

m = 16× 16, k = 1024

Page 21: Talk icml

Online Dictionary LearningExperimental results, ODL vs SGD

m = 8× 8, k = 256

Page 22: Talk icml

Online Dictionary LearningExperimental results, ODL vs SGD

m = 12× 12× 3, k = 512

Page 23: Talk icml

Online Dictionary LearningExperimental results, ODL vs SGD

m = 16× 16, k = 1024

Page 24: Talk icml

Online Dictionary LearningInpainting a 12-Mpixel photograph

Page 25: Talk icml

Online Dictionary LearningInpainting a 12-Mpixel photograph

Page 26: Talk icml

Online Dictionary LearningInpainting a 12-Mpixel photograph

Page 27: Talk icml

Online Dictionary LearningInpainting a 12-Mpixel photograph

Page 28: Talk icml

1 The Dictionary Learning Problem

2 Online Dictionary Learning

3 Extensions

Page 29: Talk icml

Extension to NMF and sparse PCA

NMF extension

minα∈Rk×n

D∈C

n∑i=1

12||xi −Dαi ||22 s.t. αi ≥ 0, D ≥ 0.

SPCA extension

minα∈Rk×n

D∈C′

n∑i=1

12||xi −Dαi ||22 + λ||α1||1

C ′ M= {D ∈ Rm×k s.t. ∀j ||dj ||22 + γ||dj ||1 ≤ 1}.

Page 30: Talk icml

Extension to NMF and sparse PCAFaces: Extended Yale Database B

(a) PCA (b) NNMF (c) DL

Page 31: Talk icml

Extension to NMF and sparse PCAFaces: Extended Yale Database B

(d) SPCA, τ = 70% (e) SPCA, τ = 30% (f) SPCA, τ = 10%

Page 32: Talk icml

Extension to NMF and sparse PCANatural Patches

(a) PCA (b) NNMF (c) DL

Page 33: Talk icml

Extension to NMF and sparse PCANatural Patches

(d) SPCA, τ = 70% (e) SPCA, τ = 30% (f) SPCA, τ = 10%

Page 34: Talk icml

Conclusion

Take-home messageOnline techniques are adapted to thedictionary learning problem.Our method makes some large-scaleimage processing tasks tractable—. . .. . .— and extends to various matrixfactorization problems.