michigan state university inci m. baytas deep learning ...cse802/s17/slides/lec_10_11_feb15.pdf ·...
TRANSCRIPT
CSE 802Spring 2017
Deep LearningInci M. Baytas
Michigan State UniversityFebruary 13-15, 2017
1
Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014
2
Deep Learning in Computer Vision
3
Deep Learning in Computer Vision
Microsoft Deep Learning Semantic Image Segmentation
4
Deep Learning in Computer Vision
NeuralTalk and Walk, recognition, text description of the image while walking.
● Natural Language Processing (NLP)● Speech recognition and machine translation
7
Other Applications of Deep Learning
Why Should We Be Impressed? ● Automated vision (e.g., object recognition) is challenging: different
viewpoints, scales, occlusions, illumination,…● Robotics (e.g., autonomous driving) in real life environments
(constantly changing, new tasks without guidance, unexpected
factors) is challenging
● NLP (e.g., understanding human conversations) is an extremely
complex task: noise, context, partial sentences, different accent,..
Why Is Deep Learning So Popular Now?• Better hardware
• Bigger data
• Regularization methods (dropout)
• Variety of optimization methods
• SGD
• Adagrad
• Adadelta
• ADAM
• RMS Prop8
Criticism and Limitations of Deep Networks
• Large amount of data required for training
• High performance computing a necessity
• Non-optimal method
• Task specific
• Lack of theoretical understanding
9
10
Common Deep Network TypesFeed forward networks Convolutional neural
networks
Recurrent neural networks
Components of Deep Learning
11
Loss functions● Squared loss: (y - f(x))2
● Logistic loss: log(1 + e-yf(x))● Hinge loss: (1 + yf(x))+● Squared hinge loss: (1 + yf(x))+
2
Non-linear activation functions● Linear● Tanh● Sigmoid● Softmax● ReLU
12
13
Components of Deep LearningOptimizers● Gradient Descent● Adagrad (Adaptive Gradient Algorithm)● Adadelta (An Adaptive Learning Rate Method)● ADAM (Adaptive Moment Estimation)● RMSProp
Regularization Methods● L2 norm● L1 norm● Dataset Augmentation● Noise robustness● Early stopping● Dropout [12]
14
Components of Deep LearningNumber of iterations● Less iterations: may underfitting● More iterations: use a stopping criteria
Step size● Very large step size: may miss optimal point● Very small step size: takes longer to converge
Parameter Initialization● Initializing with zeros● Random initialization● Xavier initialization
15
Components of Deep LearningBatch size● Bigger batch size: might require less iterations ● Smaller batch size: will need more iterations
Number of layers● More layers (more depth): introducing more non-linearity, more complexity,
more parameters● Too many layers might cause overfitting.
Number of hidden parameters● Large number of hidden layer: more model complexity, can approximate a
more complex classifier● Too many parameters: overfitting, increased training time
• Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers [1].
16
Convolutional Neural Networks
Convolution:• A linear operator• Cross-correlation with a flipped
kernel.• Convolution in spatial domain
corresponds to multiplication in frequency domain.
• Feed forward networks that can extract topological features from images.
• Can provide invariance to geometric distortions such as translation, scaling, and rotation.
• Hierarchical and robust feature extraction was done before CNNs.• CNN is data-driven.• Parameters of filters are learned from the data instead of
predefined values.• At each iteration, parameters are updated to minimize the
loss.
17
Convolutional Neural Networks (CNNs)
18
Convolution Layer • Local (sparse) connectivity
• Reduces memory requirements
• Fewer operations• Parameter sharing
• Same kernel used at every position of the input
• How to choose the filter size?
• Receptive field
● Equivariance property
19
Pooling Layer (Subsampling)
• Convolution stage:• several convolutions in
parallel to produce a set of linear activations
• Followed by non-linear activation
• Then pooling layer:• Invariance to small
translations• Dealing with variable size
inputs
• Maps the latent representation of input to output
• Output:• One-hot representation of class
label• Predicted response
• Appropriate activation function, e.g., softmax for classification.
20
Fully-Connected Layer
21
Feature Extraction with CNNs
22
Some Example CNN Architectures
LeNet-5 [2]
23
Some Example CNN Architectures
AlexNet (5 layers)
24VGG 16 [3]
Some Example CNN Architectures
25
GoogLeNet (22 layers)
26
Tricks to Improve CNN Performance
• Data augmentation
• Flipping (commonly used in face)
• Translation
• Rotation
• Stretching
• Normalizing, Whitening (less redundancy)
• Cropping and alignment (for especially face)
27
Project• You will implement 11-layer CNN architecture proposed in [6] to extract features.
28
Project• You can use a deep learning library to implement the network.
• Library will take care of convolution, pooling, dropout, and
back propagation.
• You need to define cost function and activation functions.
• The activation function of the output layer is softmax since it is
a classification problem.
• You can use tensorflow.
29
HPCC• Data and evaluation protocol are on HPCC.
•/mnt/research/CSE_802_SPR_17
• To connect HPCC: ssh [email protected] and msu
email password
• To run small examples use developer mode: ssh dev-intel14
• Try to log in to HPCC and check the course research space.
• Try to use a python IDE (PyCharm). Debug your code and
understand how tensorflow works (if you are not familiar with a
deep learning library).
30
Casia Dataset (Cropped Images)• The database contains 494,414
images.
• 10,575 subjects in total
• We provide cropped and original
images under
/mnt/research/CSE_802_SPR_17
31
Test Data and Evaluation Protocol
● Final evaluation on Labeled Faces in the Wild (LFW) database [7] with 13,233 images, 5,749 subjects.
● Evaluation protocol:○ BLUFR protocol [8];
find under /mnt/research/CSE_802_SPR_17
32
References1. http://www.deeplearningbook.org/2. http://yann.lecun.com/exdb/lenet/3. https://www.cs.toronto.edu/~frossard/post/vgg16/4. A. Krizhevsky, I. Sutskever and G. E. Hinton “ImageNet Classification with Deep Convolutional Neural
Networks”, NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada 5. http://pubs.sciepub.com/ajme/2/7/9/6. Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li. Learning Face Representation from Scratch,
arXiv:1411.7923v1 [cs.CV], 2014.7. http://vis-www.cs.umass.edu/lfw/8. http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/9. http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html
10. https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc11. Shengcai Liao, Zhen Lei, Dong Yi, Stan Z. Li, "A Benchmark Study of Large-scale Unconstrained Face
Recognition." In IAPR/IEEE International Joint Conference on Biometrics, Sep. 29 - Oct. 2, Clearwater, Florida, USA, 2014.
12. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15 (2014) 1929-1958.