michigan state university inci m. baytas deep learning ...cse802/s17/slides/lec_10_11_feb15.pdf ·...

CSE 802Spring 2017

Deep LearningInci M. Baytas

Michigan State UniversityFebruary 13-15, 2017

1

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

2

Deep Learning in Computer Vision

http://www.youtube.com/watch?v=qrzQ_AB1DZk

3


Microsoft Deep Learning Semantic Image Segmentation

http://www.youtube.com/watch?v=CxanE_W46ts

4


NeuralTalk and Walk, recognition, text description of the image while walking.

http://www.youtube.com/watch?v=8BFzu9m52sc

5

Deep Learning in Robotics

Self Driving Cars

http://www.youtube.com/watch?v=-96BEoXJMs0

6

Deep Learning in Robotics

Deep Sensimotor Learning

http://www.youtube.com/watch?v=2hGngG64dNM

● Natural Language Processing (NLP)● Speech recognition and machine translation

7

Other Applications of Deep Learning

Why Should We Be Impressed? ● Automated vision (e.g., object recognition) is challenging: different

viewpoints, scales, occlusions, illumination,…● Robotics (e.g., autonomous driving) in real life environments

(constantly changing, new tasks without guidance, unexpected

factors) is challenging

● NLP (e.g., understanding human conversations) is an extremely

complex task: noise, context, partial sentences, different accent,..

Why Is Deep Learning So Popular Now?• Better hardware

• Bigger data

• Regularization methods (dropout)

• Variety of optimization methods

• SGD

• Adagrad

• Adadelta

• ADAM

• RMS Prop8

Criticism and Limitations of Deep Networks

• Large amount of data required for training

• High performance computing a necessity

• Non-optimal method

• Task specific

• Lack of theoretical understanding

9

10

Common Deep Network TypesFeed forward networks Convolutional neural

networks

Recurrent neural networks

Components of Deep Learning

11

Loss functions● Squared loss: (y - f(x))2

● Logistic loss: log(1 + e-yf(x))● Hinge loss: (1 + yf(x))+● Squared hinge loss: (1 + yf(x))+

2

Non-linear activation functions● Linear● Tanh● Sigmoid● Softmax● ReLU

13

Components of Deep LearningOptimizers● Gradient Descent● Adagrad (Adaptive Gradient Algorithm)● Adadelta (An Adaptive Learning Rate Method)● ADAM (Adaptive Moment Estimation)● RMSProp

Regularization Methods● L2 norm● L1 norm● Dataset Augmentation● Noise robustness● Early stopping● Dropout [12]

14

Components of Deep LearningNumber of iterations● Less iterations: may underfitting● More iterations: use a stopping criteria

Step size● Very large step size: may miss optimal point● Very small step size: takes longer to converge

Parameter Initialization● Initializing with zeros● Random initialization● Xavier initialization

15

Components of Deep LearningBatch size● Bigger batch size: might require less iterations ● Smaller batch size: will need more iterations

Number of layers● More layers (more depth): introducing more non-linearity, more complexity,

more parameters● Too many layers might cause overfitting.

Number of hidden parameters● Large number of hidden layer: more model complexity, can approximate a

more complex classifier● Too many parameters: overfitting, increased training time

• Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers [1].

16

Convolutional Neural Networks

Convolution:• A linear operator• Cross-correlation with a flipped

kernel.• Convolution in spatial domain

corresponds to multiplication in frequency domain.

• Feed forward networks that can extract topological features from images.

• Can provide invariance to geometric distortions such as translation, scaling, and rotation.

• Hierarchical and robust feature extraction was done before CNNs.• CNN is data-driven.• Parameters of filters are learned from the data instead of

predefined values.• At each iteration, parameters are updated to minimize the

loss.

17

Convolutional Neural Networks (CNNs)

18

Convolution Layer • Local (sparse) connectivity

• Reduces memory requirements

• Fewer operations• Parameter sharing

• Same kernel used at every position of the input

• How to choose the filter size?

• Receptive field

● Equivariance property

19

Pooling Layer (Subsampling)

• Convolution stage:• several convolutions in

parallel to produce a set of linear activations

• Followed by non-linear activation

• Then pooling layer:• Invariance to small

translations• Dealing with variable size

inputs

• Maps the latent representation of input to output

• Output:• One-hot representation of class

label• Predicted response

• Appropriate activation function, e.g., softmax for classification.

20

Fully-Connected Layer

21

Feature Extraction with CNNs

22

Some Example CNN Architectures

LeNet-5 [2]

23


AlexNet (5 layers)

24VGG 16 [3]


25

GoogLeNet (22 layers)

26

Tricks to Improve CNN Performance

• Data augmentation

• Flipping (commonly used in face)

• Translation

• Rotation

• Stretching

• Normalizing, Whitening (less redundancy)

• Cropping and alignment (for especially face)

27

Project• You will implement 11-layer CNN architecture proposed in [6] to extract features.

28

Project• You can use a deep learning library to implement the network.

• Library will take care of convolution, pooling, dropout, and

back propagation.

• You need to define cost function and activation functions.

• The activation function of the output layer is softmax since it is

a classification problem.

• You can use tensorflow.

https://www.tensorflow.org/tutorials/mnist/beginners/

29

HPCC• Data and evaluation protocol are on HPCC.

•/mnt/research/CSE_802_SPR_17

• To connect HPCC: ssh [email protected] and msu

email password

• To run small examples use developer mode: ssh dev-intel14

• Try to log in to HPCC and check the course research space.

• Try to use a python IDE (PyCharm). Debug your code and

understand how tensorflow works (if you are not familiar with a

deep learning library).

https://wiki.hpcc.msu.edu/display/hpccdocs/Documentation+and+User+Manual

mailto:[email protected]

https://www.jetbrains.com/pycharm/

30

Casia Dataset (Cropped Images)• The database contains 494,414

images.

• 10,575 subjects in total

• We provide cropped and original

images under

/mnt/research/CSE_802_SPR_17

31

Test Data and Evaluation Protocol

● Final evaluation on Labeled Faces in the Wild (LFW) database [7] with 13,233 images, 5,749 subjects.

● Evaluation protocol:○ BLUFR protocol [8];

find under /mnt/research/CSE_802_SPR_17

32

References1. http://www.deeplearningbook.org/2. http://yann.lecun.com/exdb/lenet/3. https://www.cs.toronto.edu/~frossard/post/vgg16/4. A. Krizhevsky, I. Sutskever and G. E. Hinton “ImageNet Classification with Deep Convolutional Neural

Networks”, NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada 5. http://pubs.sciepub.com/ajme/2/7/9/6. Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li. Learning Face Representation from Scratch,

arXiv:1411.7923v1 [cs.CV], 2014.7. http://vis-www.cs.umass.edu/lfw/8. http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/9. http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

10. https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc11. Shengcai Liao, Zhen Lei, Dong Yi, Stan Z. Li, "A Benchmark Study of Large-scale Unconstrained Face

Recognition." In IAPR/IEEE International Joint Conference on Biometrics, Sep. 29 - Oct. 2, Clearwater, Florida, USA, 2014.

12. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15 (2014) 1929-1958.

http://www.deeplearningbook.org/

http://www.deeplearningbook.org/

http://yann.lecun.com/exdb/lenet/

http://yann.lecun.com/exdb/lenet/

https://www.cs.toronto.edu/~frossard/post/vgg16/

https://www.cs.toronto.edu/~frossard/post/vgg16/

http://pubs.sciepub.com/ajme/2/7/9/

http://pubs.sciepub.com/ajme/2/7/9/

http://vis-www.cs.umass.edu/lfw/

http://vis-www.cs.umass.edu/lfw/

http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/

http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/

http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc

https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc

michigan state university inci m. baytas deep learning ...cse802/s17/slides/lec_10_11_feb15.pdf ·...

Documents