Deep Learning
An Introduction to
By Rahil Mahdian – December 12-13, 2017
1
Outline • Machine Learning
• Learning Strategies
• Neural Network Learning
• Deep Learning
• Feed Forward Network
• Problems of Deep Learning
• AutoEncoders
• Restricted Boltzmann Machines
• Convolutional Neural Networks
• Recurrent Neural Networks (RNNs, LSTM)
• Deep Learning Applications
2
Scope of Machine Learning
3
Nando de Freitas, Oxford
When to Apply Machine Learning
4
Nando de Freitas, Oxford
Machine Learning Pioneering ( C. Shannon 1961)
5
Learning Types
6
Supervised Learning UnSupervised Learning
Semi-Supervised Learning
Machine Learning vs Deep Learning
7
Perceptron – Single Neuron element, Rosenblatt 1958
8
e.g. Sigmoidal function
Neural Networks (MLP)
9
Training schemes (SGD, Batch, MiniBatch)
10
SGD Batch Mini-Batch
MLP - Function Approximation
11
Deep Motivation- Different Layers of Abstraction
12
Deep Neural Networks
13
Feed Forward Neural Networks
14
Training DNNs - Backpropagation
15
Deep NN - Training Problem
16
The back-propagation encounters the three following difficulties in the training process of deep neural networks:
Vanishing Gradient- output error fails to reach the farther back nodes Overfitting Computational Load
Vanishing Gradient Solutions
17
Overfitting – Generalization Problem
18
Bishop
Model Complexity
19
Data Matters
20
Among competing hypotheses, the one with the fewest assumptions should be selected.
In the related concept of overfitting, excessively complex models are affected by statistical
noise (a problem also known as the bias-variance trade-off), whereas simpler models may
capture the underlying structure better and may thus have better predictive performance.
Hoeffding’s inequalities:
Failure rate
Empirical error rate
Occam's razor: William of Ockham (c. 1287–1347)
# of model parameters
True error rate
Number of sufficient samples
A 32-bits floating point computer
Overfitting Solutions – Dropout & Regularization
21
50% of hidden layers, and 25% for the input layer
Rule of thumb:
Regularization:
Drop out:
Add a norm of the weights to the cost function. (l1-norm, l2-norm)
Data Augmentation is also a way to avoid overfitting; i.e., adding noise, translating data, etc.
AutoEncoder- Nonlinear dimensionality reduction
22
Restricted Boltzmann Machines (RBM)
23
Hugo Larochelle
Unsupervised Pre-training – another solution
24
Convolution Neural Network (Lecun et al. 1993, LeNet)
25
Convolution NN - Architecture
26
Photo: Phil Kim
ConvNet – How it works?
27
Vincent Vanhoucke- Google
Convolutional NN – (LeCun, Fukushima)
28
AlexNet - 2012
ConvNet for Speech
29
CNN Structures
30
LSTM and RNNs – sequential data
31
LSTM Training
32
RNNs & Multi-Hypothesis Tracking- BeamSearch
33
Vincent Vanhoucke- Google
Captioning & Translation - RNNs
34
Vincent Vanhoucke- Google
Deep Learning Applications: Computer Vision
35
DNN Application - Caption Generation
36
Wrap Up
37
Machine Learning Influence Learning Neural Networks Deep Learning Motivation, Problems, Solutions Unsupervised Neural Networks: AutoEncoders, Restricted Boltzmann Machines Deep Learning Training Solutions Feed Forward Neural Networks as MLPs Convolutional Neural Networks Recurrent Neural Networks for sequential data LSTMs as a generalization of RNNs Applications of DNNs
38
Thanks for attending the Talk.
Questions?