eccv2010: feature learning for image classification, part 2

Andrew Ng

Image classification

by sparse coding

Andrew Ng

Feature learning problem

• Given a 14x14 image patch x, can represent it using 196 real numbers.

• Problem: Can we find a learn a better representation for this?

Andrew Ng

Unsupervised feature learning

Given a set of images, learn a better way to represent image than pixels.

Andrew Ng

First stage of visual processing in brain: V1

Schematic of simple cell Actual simple cell

[Images from DeAngelis, Ohzawa & Freeman, 1995]

“Gabor functions.”

The first stage of visual processing in the brain (V1) does “edge detection.”

Andrew Ng

Learning an image representation

Sparse coding (Olshausen & Field,1996)

Input: Images x(1), x(2), …, x(m) (each in Rn x n)

Learn: Dictionary of bases , …, k (also Rn x n), so that each input x can be approximately decomposed as:

s.t. aj’s are mostly zero (“sparse”)

Use to represent 14x14 image patch succinctly, as [a7=0.8, a36=0.3, a41 = 0.5]. I.e., this indicates which “basic edges” make up the image.

Andrew Ng

Sparse coding illustration

Natural Images Learned bases (1 , …, 64): “Edges”

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

0.8 * + 0.3 * + 0.5 *

x 0.8 * 36 + 0.3 * 42

+ 0.5

* 63 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [a1, …, a64] (feature representation)

Test example

Compact & easilyinterpretable

Andrew Ng

More examples

Represent as: [0, 0, …, 0, 0.6, 0, …, 0, 0.8, 0, …, 0, 0.4, …]

Represent as: [0, 0, …, 0, 1.3, 0, …, 0, 0.9, 0, …, 0, 0.3, …]

0.6 * + 0.8 * + 0.4 *

15 28

37

1.3 * + 0.9 * + 0.3 *

5 18

29

• Method hypothesizes that edge-like patches are the most “basic” elements of a scene, and represents an image in terms of the edges that appear in it.

• Use to obtain a more compact, higher-level representation of the scene than pixels.

Andrew Ng[Evan Smith & Mike Lewicki, 2006]

Digression: Sparse coding applied to audio

Andrew Ng

Digression: Sparse coding applied to audio

[Evan Smith & Mike Lewicki, 2006]

Andrew Ng

Sparse coding details


L1 sparsity term(causes most s to be 0)

Alternating minimization: Alternately minimize with respect to ‘s (easy) and a’s (harder).

Andrew Ng

Solving for bases

Early versions of sparse coding were used to learn about this many bases:

32 learned bases

How to scale this algorithm up?

Andrew Ng

Sparse coding details


L1 sparsity term

Alternating minimization: Alternately minimize with respect to ‘s (easy) and a’s (harder).

Andrew Ng

Goal: Minimize objective with respect to ai’s.

• Simplified example:

• Suppose I tell you:

• Problem simplifies to:

• This is a quadratic function of the ai’s. Can be solved efficiently in closed form.

• Algorithm:• Repeatedly guess sign (+, - or 0) of each of the ai’s.

• Solve for ai’s in closed form. Refine guess for signs.

Feature sign search (solve for ai’s)

Andrew Ng

The feature-sign search algorithm: Visualization

1a

2a

Starting from zero (default)

01 a02 a

Current guess:

Andrew Ng


1a

2a

1: Activate a2

with “+” signActive set ={a2}


01 a02 a

Current guess:

Andrew Ng


1a

2a

1: Activate a2



01 aCurrent guess:

02 a

Andrew Ng


2: Update a2 (closed form)


1: Activate a2


1a

2a

01 aCurrent guess:

02 a

Andrew Ng


3: Activate a1

with “+” signActive set ={a1,a2}


1a

2a

01 aCurrent guess:

02 a

Andrew Ng


4: Update a1 & a2 (closed form)


3: Activate a1

with “+” signActive set ={a1,a2}

1a

2a

01 aCurrent guess:

02 a

Andrew Ng

Before feature sign search

32 learned bases

Andrew Ng

With feature signed search

Andrew Ng

Recap of sparse coding for feature learning

Input: Images x(1), x(2), …, x(m) (each in Rn x n)Learn: Dictionary of bases , …, k (also Rn x n).

Tra

inin

g tim

eT

est

time

Input: Novel image x (in Rn x n) and previously learned i’s.Output: Representation [aa, …, ak] of image x.

0.8 * + 0.3 * + 0.5 *

x 0.8 * 36 + 0.3 * 42

+ 0.5

* 63Represent as: [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]

Andrew Ng

Sparse coding recap

0.8 * + 0.3 * + 0.5 *

[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]

Much better than pixel representation. But still not competitive with SIFT, etc.

Three ways to make it competitive: • Combine this with SIFT.• Advanced versions of sparse coding (LCC).• Deep learning.

Andrew Ng

Combining sparse coding with SIFT


Learn: Dictionary of bases , …, k (also Rn x n).

SIFT descriptors x(1), x(2), …, x(m) (each in R128)

R128.

Test time: Given novel SIFT descriptor, x (in R128), represent as

Andrew Ng

Putting it together

• Relate to histograms view, and so sparse-coding on top of SIFT features.

Feature representation

Learningalgorithm

x(1)

a(1)

x(2) x(3)

a(2) a(3)

…

…

orLearningalgorithm

Suppose you’ve already learned bases , …, k. Here’s how you represent an image.

E.g., 73-75% on Caltech 101 (Yang et al., 2009, Boreau et al., 2009)

Andrew Ng

K-means vs. sparse coding

Centroid 1

Centroid 2

Centroid 3

K-means

Represent as:

Andrew Ng


Centroid 1

Centroid 2

Centroid 3

K-means

Represent as:

Basis

Sparse coding

Represent as:

Basis

Basis

Intuition: “Soft” version of k-means (membership in multiple clusters).

Andrew Ng


Rule of thumb: Whenever using k-means to get a dictionary, if you replace it with sparse coding it’ll often work better.

eccv2010: feature learning for image classification, part 2

Education

input x

edges x

kalso r n x n

sparse coding recap

feature sign search

featuresign search algorithm

14x14 image patch x

sparse coding centroid