what is the best multi-stage architecture for object recognition? ruiwen wu [1] jarrett, kevin, et...

29
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.(Cited by 396 till 2014.11.12)

Upload: barry-stafford

Post on 17-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

What is the Best Multi-Stage Architecture for Object Recognition?

Ruiwen Wu

[1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.(Cited by 396 till 2014.11.12)

Page 2: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Usual architecture of the neural networks

Each part of the neural networks

Unsupervised learning conception

Experiment

Contribution of this paper

Page 3: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

2010 2011 2012 2013 20140

10000

20000

30000

40000

50000

60000

70000

80000

90000

2010; Series1; 64000

2011; Series1; 750002012; Series1; 78000

2013; Series1; 80000 2014; Series1; 78900

Papers about Neural Networks

Page 4: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

2010 2011 2012 2013 20140

100

200

300

400

500

600

2010; Papers about Unsu-pervised Pre-Training; 110

2011; Papers about Unsu-pervised Pre-Training; 160

2012; Papers about Unsu-pervised Pre-Training; 256

2013; Papers about Unsu-pervised Pre-Training; 252

2014; Papers about Unsu-pervised Pre-Training; 552

Papers about Unsupervised Pre-Training

Page 5: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

2009 2010 2011 2012 2013 20140

20

40

60

80

100

120

140

2009; citations every year; 4

2010; citations every year; 32

2011; citations every year; 622012; citations every year; 64

2013; citations every year; 1152014; citations every year; 113

Citations every year

Page 6: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Deep learning methods aims at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features[2]

Neural Networks with many hidden layers

Graphical Models with many levels of hidden layers

Other methods

Deep Learning Methods

[2]Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. Why Does Unsupervised Pre-training Help Deep Discriminant Learning?.

Page 7: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Usual architecture of neural networks

Non-linear Operation: Quantization, Winner-take-all, Sparsification, Normalization, S-function

Pooling Operation: Max, average, histogramming operator

Classifier: Neural Networks(NN), k-Nearest Neighbor(KNN), Support Vector Machine(SVM), Logistic Regression(LR)

Page 8: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

This paper addresses three questions:

How does the non-linearities that follow the filter banks influence the recognition accuracy?

Does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters?

Is there any advantage to using an architecture with two stages of feature extraction, rather than one?

Questions to address

Page 9: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

To address these three questions, they experimented with various combinations of architectures:

One stage or two stages of feature extraction

Different types of non-linearities

Different types of filters

Different filter learning methods(random, unsupervised and supervised)

Test Dataset: Caltech-101 dataset; NORB object dataset; MNIST dataset

Experiments Architecture

Page 10: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Filter Bank Layers(FCSG)

Local Contrast Normalization Layer(N)

Pooling and Subsampling Layer(PA or PM)

Model Architecture

Page 11: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

The module computes:

Filter Bank Layer(FCSG)

tanh( * )i i ij ii

y g k x * is the convolution operater, tanh is hyperbolic tangent non-linearity, g is a trainable scalar coefficient.

Output size: assume each map is n1 x n2, each kernel is l1 x l2, then the output y is (n1-l1+1) x (n2-l2+1)

The kernel here could be either supervised trained or unsupervised pre-trained

Page 12: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Local Contrast Normalization Layer(N)

, ,ijk ijk pq i j p k qipqv x w x

/ max(c, )ijk ijk jky v

1 22, ,jk pq i j p k qipq

w v

C is the mean( )jk

I am not quiet understand this part

Wpq is Gaussian weighting window

Page 13: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Local Contrast Normalization Layer(N)

The result of this module:

20 40 60 80 100120

50

100

15020 40 60 80 100120140

20

40

60

80

100

120

140

It seems like this module is doing edge extraction

Page 14: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Pooling and Subsampling Layer(PA or PM)

For each of the small area:

, ,ijk pq i j p k qpqy w x

Where is a uniform weighting window or max weighting window

Each output feature map is then subsampled spatially by a factory S horizontally and vertically

pqw

Page 15: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Combine Modules

There could be three types of architectures of this network:

FCSG ---- PA

FCSG ---- N ---- PA

FCSG ---- PM

Page 16: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Training Protocol

Random Features and Supervised Classifier – R and RR

Unsupervised Features, Supervised Classifier - U and UU

Random Features, Global Supervised Refinement - R+ and R+R+

Unsupervised Feature, Global Supervised Refinement U+ and U+U+

Page 17: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Unsupervised Training of Filter Banks

For a given input X, a matrix W whose columns are the dictionary elements, feature vector Z is obtained by ∗minimizing the following energy function

2

2 1

*

(X,Z,W)

argmin (X,Z,W)

OF

OFZ

E X WZ Z

Z E

where λ is a sparsity hyper-parameter.

For any input X, one needs to run a rather expensive optimization algorithm to find Z , To alleviate the problem, ∗the PSD method is imported.

Page 18: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Predictive Sparse Decomposition(PSD)[3]

[3] Kavukcuoglu, Koray, Marc'Aurelio Ranzato, and Yann LeCun. "Fast inference in sparse coding algorithms with applications to object recognition." arXiv preprint arXiv:1010.3467 (2010).(cited by 94)

2 2

2 1 2

*

(X,Z,W,K) (X,K)

argmin (X,Z,W,K)

K {G,S,D}

C(X,K) C(X,G,S,D) Gtanh(SX D)

PSD

PSDZ

E X WZ Z Z C

Z E

where S R∈ m×n is a filter matrix, D R∈ m is a vector of biases

Page 19: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Result

Page 20: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Why does Unsupervised Pre-training Help Deep Discriminant Learning?[2]

Page 21: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Reference of the graph

[2]Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. Why Does Unsupervised Pre-training Help Deep Discriminant Learning?.[3] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554.[4] Zhu, L., Chen, Y., & Yuille, A. (2009). Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 114–128.[5] Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08) (pp. 1168–1175). New York, NY, USA: ACM.[6]LeCun, Yann, et al. "Gradient-based learning applied to document recognition."Proceedings of the IEEE 86.11 (1998): 2278-2324.[7] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554.

Page 22: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

non-convex function

In deep learning, the objective function is usually a highly non-convex function of the parameters, so there must be many local minima in the model parameter space

Supervised Learning use a fix point or a random point as the initialization. So in some or most situations, it converges at a local minima

Page 23: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Local Minima

Page 24: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Random Initialization

Page 25: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Unsupervised Pre-training

Page 26: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Reason

There are a few reasonable hypotheses why pre-training might work.

One possibility that unsupervised pre-training acts as a kind of regularizer, putting the parameter values in the appropriate range for discriminant training

Another possibility, is that pre-training initializes the model to a point in parameter space that somehow renders the optimization process more effective, in the sense of achieving a lower minimum of the empirical cost function.

Page 27: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Conclusion

Future work should clarify this hypothesis.

Understanding and improving deep architectures remains a challenge.

This work helps with such understanding via extensive simulations and puts forward and confirms a hypothesis explaining the mechanisms behind the effect of unsupervised pre-training for the final discriminant learning task.

Page 28: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Reference[1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.(Cited by 396 till 2014.11.12)[2]Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. Why Does Unsupervised Pre-training Help Deep Discriminant Learning?.[3] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554.[4] Zhu, L., Chen, Y., & Yuille, A. (2009). Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 114–128.[5] Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08) (pp. 1168–1175). New York, NY, USA: ACM.[6]LeCun, Yann, et al. "Gradient-based learning applied to document recognition."Proceedings of the IEEE 86.11 (1998): 2278-2324.[7] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554.

Page 29: What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object

Thank You!