eccv2010: feature learning for image classification, part 2

28
Andrew Ng Image classification by sparse coding

Upload: zukun

Post on 26-May-2015

509 views

Category:

Education


1 download

TRANSCRIPT

Page 1: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Image classification

by sparse coding

Page 2: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Feature learning problem

• Given a 14x14 image patch x, can represent it using 196 real numbers.

• Problem: Can we find a learn a better representation for this?

Page 3: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Unsupervised feature learning

Given a set of images, learn a better way to represent image than pixels.

Page 4: ECCV2010: feature learning for image classification, part 2

Andrew Ng

First stage of visual processing in brain: V1

Schematic of simple cell Actual simple cell

[Images from DeAngelis, Ohzawa & Freeman, 1995]

“Gabor functions.”

The first stage of visual processing in the brain (V1) does “edge detection.”

Page 5: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Learning an image representation

Sparse coding (Olshausen & Field,1996)

Input: Images x(1), x(2), …, x(m) (each in Rn x n)

Learn: Dictionary of bases , …, k (also Rn x n), so that each input x can be approximately decomposed as:

s.t. aj’s are mostly zero (“sparse”)

Use to represent 14x14 image patch succinctly, as [a7=0.8, a36=0.3, a41 = 0.5]. I.e., this indicates which “basic edges” make up the image.

Page 6: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Sparse coding illustration

Natural Images Learned bases (1 , …, 64): “Edges”

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

0.8 * + 0.3 * + 0.5 *

x 0.8 * 36 + 0.3 * 42

+ 0.5

* 63 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [a1, …, a64] (feature representation)

Test example

Compact & easilyinterpretable

Page 7: ECCV2010: feature learning for image classification, part 2

Andrew Ng

More examples

Represent as: [0, 0, …, 0, 0.6, 0, …, 0, 0.8, 0, …, 0, 0.4, …]

Represent as: [0, 0, …, 0, 1.3, 0, …, 0, 0.9, 0, …, 0, 0.3, …]

0.6 * + 0.8 * + 0.4 *

15 28

37

1.3 * + 0.9 * + 0.3 *

5 18

29

• Method hypothesizes that edge-like patches are the most “basic” elements of a scene, and represents an image in terms of the edges that appear in it.

• Use to obtain a more compact, higher-level representation of the scene than pixels.

Page 8: ECCV2010: feature learning for image classification, part 2

Andrew Ng[Evan Smith & Mike Lewicki, 2006]

Digression: Sparse coding applied to audio

Page 9: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Digression: Sparse coding applied to audio

[Evan Smith & Mike Lewicki, 2006]

Page 10: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Sparse coding details

Input: Images x(1), x(2), …, x(m) (each in Rn x n)

L1 sparsity term(causes most s to be 0)

Alternating minimization: Alternately minimize with respect to ‘s (easy) and a’s (harder).

Page 11: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Solving for bases

Early versions of sparse coding were used to learn about this many bases:

32 learned bases

How to scale this algorithm up?

Page 12: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Sparse coding details

Input: Images x(1), x(2), …, x(m) (each in Rn x n)

L1 sparsity term

Alternating minimization: Alternately minimize with respect to ‘s (easy) and a’s (harder).

Page 13: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Goal: Minimize objective with respect to ai’s.

• Simplified example:

• Suppose I tell you:

• Problem simplifies to:

• This is a quadratic function of the ai’s. Can be solved efficiently in closed form.

• Algorithm:• Repeatedly guess sign (+, - or 0) of each of the ai’s.

• Solve for ai’s in closed form. Refine guess for signs.

Feature sign search (solve for ai’s)

Page 14: ECCV2010: feature learning for image classification, part 2

Andrew Ng

The feature-sign search algorithm: Visualization

1a

2a

Starting from zero (default)

01 a02 a

Current guess:

Page 15: ECCV2010: feature learning for image classification, part 2

Andrew Ng

The feature-sign search algorithm: Visualization

1a

2a

1: Activate a2

with “+” signActive set ={a2}

Starting from zero (default)

01 a02 a

Current guess:

Page 16: ECCV2010: feature learning for image classification, part 2

Andrew Ng

The feature-sign search algorithm: Visualization

1a

2a

1: Activate a2

with “+” signActive set ={a2}

Starting from zero (default)

01 aCurrent guess:

02 a

Page 17: ECCV2010: feature learning for image classification, part 2

Andrew Ng

The feature-sign search algorithm: Visualization

2: Update a2 (closed form)

Starting from zero (default)

1: Activate a2

with “+” signActive set ={a2}

1a

2a

01 aCurrent guess:

02 a

Page 18: ECCV2010: feature learning for image classification, part 2

Andrew Ng

The feature-sign search algorithm: Visualization

3: Activate a1

with “+” signActive set ={a1,a2}

Starting from zero (default)

1a

2a

01 aCurrent guess:

02 a

Page 19: ECCV2010: feature learning for image classification, part 2

Andrew Ng

The feature-sign search algorithm: Visualization

4: Update a1 & a2 (closed form)

Starting from zero (default)

3: Activate a1

with “+” signActive set ={a1,a2}

1a

2a

01 aCurrent guess:

02 a

Page 20: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Before feature sign search

32 learned bases

Page 21: ECCV2010: feature learning for image classification, part 2

Andrew Ng

With feature signed search

Page 22: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Recap of sparse coding for feature learning

Input: Images x(1), x(2), …, x(m) (each in Rn x n)Learn: Dictionary of bases , …, k (also Rn x n).

Tra

inin

g tim

eT

est

time

Input: Novel image x (in Rn x n) and previously learned i’s.Output: Representation [aa, …, ak] of image x.

0.8 * + 0.3 * + 0.5 *

x 0.8 * 36 + 0.3 * 42

+ 0.5

* 63Represent as: [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]

Page 23: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Sparse coding recap

0.8 * + 0.3 * + 0.5 *

[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]

Much better than pixel representation. But still not competitive with SIFT, etc.

Three ways to make it competitive: • Combine this with SIFT.• Advanced versions of sparse coding (LCC).• Deep learning.

Page 24: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Combining sparse coding with SIFT

Input: Images x(1), x(2), …, x(m) (each in Rn x n)

Learn: Dictionary of bases , …, k (also Rn x n).

SIFT descriptors x(1), x(2), …, x(m) (each in R128)

R128.

Test time: Given novel SIFT descriptor, x (in R128), represent as

Page 25: ECCV2010: feature learning for image classification, part 2

Andrew Ng

Putting it together

• Relate to histograms view, and so sparse-coding on top of SIFT features.

Feature representation

Learningalgorithm

x(1)

a(1)

x(2) x(3)

a(2) a(3)

orLearningalgorithm

Suppose you’ve already learned bases , …, k. Here’s how you represent an image.

E.g., 73-75% on Caltech 101 (Yang et al., 2009, Boreau et al., 2009)

Page 26: ECCV2010: feature learning for image classification, part 2

Andrew Ng

K-means vs. sparse coding

Centroid 1

Centroid 2

Centroid 3

K-means

Represent as:

Page 27: ECCV2010: feature learning for image classification, part 2

Andrew Ng

K-means vs. sparse coding

Centroid 1

Centroid 2

Centroid 3

K-means

Represent as:

Basis

Sparse coding

Represent as:

Basis

Basis

Intuition: “Soft” version of k-means (membership in multiple clusters).

Page 28: ECCV2010: feature learning for image classification, part 2

Andrew Ng

K-means vs. sparse coding

Rule of thumb: Whenever using k-means to get a dictionary, if you replace it with sparse coding it’ll often work better.