andrew ng image classification by sparse coding. andrew ng feature learning problem given a 14x14...
TRANSCRIPT
Andrew Ng
Image classification
by sparse coding
Andrew Ng
Feature learning problem
• Given a 14x14 image patch x, can represent it using 196 real numbers.
• Problem: Can we find a learn a better representation for this?
Andrew Ng
Unsupervised feature learning
Given a set of images, learn a better way to represent image than pixels.
Andrew Ng
First stage of visual processing in brain: V1
Schematic of simple cell Actual simple cell
[Images from DeAngelis, Ohzawa & Freeman, 1995]
“Gabor functions.”
The first stage of visual processing in the brain (V1) does “edge detection.”
Andrew Ng
Learning an image representation
Sparse coding (Olshausen & Field,1996)
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
Learn: Dictionary of bases , …, k (also Rn x n), so that each input x can be approximately decomposed as:
s.t. aj’s are mostly zero (“sparse”)
Use to represent 14x14 image patch succinctly, as [a7=0.8, a36=0.3, a41 = 0.5]. I.e., this indicates which “basic edges” make up the image.
Andrew Ng
Sparse coding illustration
Natural Images Learned bases (1 , …, 64): “Edges”
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
0.8 * + 0.3 * + 0.5 *
x 0.8 * 36 + 0.3 * 42
+ 0.5
* 63 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [a1, …, a64] (feature representation)
Test example
Compact & easilyinterpretable
Andrew Ng
More examples
Represent as: [0, 0, …, 0, 0.6, 0, …, 0, 0.8, 0, …, 0, 0.4, …]
Represent as: [0, 0, …, 0, 1.3, 0, …, 0, 0.9, 0, …, 0, 0.3, …]
0.6 * + 0.8 * + 0.4 *
15 28
37
1.3 * + 0.9 * + 0.3 *
5 18
29
• Method hypothesizes that edge-like patches are the most “basic” elements of a scene, and represents an image in terms of the edges that appear in it.
• Use to obtain a more compact, higher-level representation of the scene than pixels.
Andrew Ng[Evan Smith & Mike Lewicki, 2006]
Digression: Sparse coding applied to audio
Andrew Ng
Digression: Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Andrew Ng
Sparse coding details
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
L1 sparsity term(causes most s to be 0)
Alternating minimization: Alternately minimize with respect to ‘s (easy) and a’s (harder).
Andrew Ng
Solving for bases
Early versions of sparse coding were used to learn about this many bases:
32 learned bases
How to scale this algorithm up?
Andrew Ng
Sparse coding details
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
L1 sparsity term
Alternating minimization: Alternately minimize with respect to ‘s (easy) and a’s (harder).
Andrew Ng
Goal: Minimize objective with respect to ai’s.
• Simplified example:
• Suppose I tell you:
• Problem simplifies to:
• This is a quadratic function of the ai’s. Can be solved efficiently in closed form.
• Algorithm:• Repeatedly guess sign (+, - or 0) of each of the ai’s.
• Solve for ai’s in closed form. Refine guess for signs.
Feature sign search (solve for ai’s)
Andrew Ng
The feature-sign search algorithm: Visualization
1a
2a
Starting from zero (default)
01 a02 a
Current guess:
Andrew Ng
The feature-sign search algorithm: Visualization
1a
2a
1: Activate a2
with “+” signActive set ={a2}
Starting from zero (default)
01 a02 a
Current guess:
Andrew Ng
The feature-sign search algorithm: Visualization
1a
2a
1: Activate a2
with “+” signActive set ={a2}
Starting from zero (default)
01 aCurrent guess:
02 a
Andrew Ng
The feature-sign search algorithm: Visualization
2: Update a2 (closed form)
Starting from zero (default)
1: Activate a2
with “+” signActive set ={a2}
1a
2a
01 aCurrent guess:
02 a
Andrew Ng
The feature-sign search algorithm: Visualization
3: Activate a1
with “+” signActive set ={a1,a2}
Starting from zero (default)
1a
2a
01 aCurrent guess:
02 a
Andrew Ng
The feature-sign search algorithm: Visualization
4: Update a1 & a2 (closed form)
Starting from zero (default)
3: Activate a1
with “+” signActive set ={a1,a2}
1a
2a
01 aCurrent guess:
02 a
Andrew Ng
Before feature sign search
32 learned bases
Andrew Ng
With feature signed search
Andrew Ng
Recap of sparse coding for feature learning
Input: Images x(1), x(2), …, x(m) (each in Rn x n)Learn: Dictionary of bases , …, k (also Rn x n).
Tra
inin
g tim
eT
est
time
Input: Novel image x (in Rn x n) and previously learned i’s.Output: Representation [aa, …, ak] of image x.
0.8 * + 0.3 * + 0.5 *
x 0.8 * 36 + 0.3 * 42
+ 0.5
* 63Represent as: [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
Andrew Ng
Sparse coding recap
0.8 * + 0.3 * + 0.5 *
[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
Much better than pixel representation. But still not competitive with SIFT, etc.
Three ways to make it competitive: • Combine this with SIFT.• Advanced versions of sparse coding (LCC).• Deep learning.
Andrew Ng
Combining sparse coding with SIFT
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
Learn: Dictionary of bases , …, k (also Rn x n).
SIFT descriptors x(1), x(2), …, x(m) (each in R128)
R128.
Test time: Given novel SIFT descriptor, x (in R128), represent as
Andrew Ng
Putting it together
• Relate to histograms view, and so sparse-coding on top of SIFT features.
Feature representation
Learningalgorithm
x(1)
a(1)
x(2) x(3)
a(2) a(3)
…
…
orLearningalgorithm
Suppose you’ve already learned bases , …, k. Here’s how you represent an image.
E.g., 73-75% on Caltech 101 (Yang et al., 2009, Boreau et al., 2009)
Andrew Ng
K-means vs. sparse coding
Centroid 1
Centroid 2
Centroid 3
K-means
Represent as:
Andrew Ng
K-means vs. sparse coding
Centroid 1
Centroid 2
Centroid 3
K-means
Represent as:
Basis
Sparse coding
Represent as:
Basis
Basis
Intuition: “Soft” version of k-means (membership in multiple clusters).
Andrew Ng
K-means vs. sparse coding
Rule of thumb: Whenever using k-means to get a dictionary, if you replace it with sparse coding it’ll often work better.