on deep learning - joao-pereira.pt · joão pereira 3 22/10/2019. part 1 joão pereira 4. big...

127
On Deep Learning An Overview João Pereira October 2019 PDEng Data Science Talks

Upload: others

Post on 04-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

On Deep LearningAn Overview

João Pereira

October 2019PDEng Data Science Talks

Page 2: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Outline – Part 1

• Introduction to deep learning• Types of learning• Historical Perspective• Neural networks• Training Neural Networks• Optimization Challenges• Regularization strategies• Generalization

João Pereira 2

15/10/2019

Page 3: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Outline – Part 2

• Convolutional Neural Networks (CNNs)• Recurrent Neural Networks (RNNs)

• Vanilla RNNs• LSTMs and GRUs

• Sequence to Sequence Models (Seq2Seq)• Attention Mechanisms• Applications

João Pereira 3

22/10/2019

Page 4: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Part 1

João Pereira 4

Page 5: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Big Picture

João Pereira 5

Machine Learning

Artificial Intelligence

Neural Networks

Deep Learning

Page 6: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Learning from data

João Pereira 6

Page 7: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Inputs and Outputs

• Inputs• Words, sentences• Images, videos• Observations, time series• Voice

• Outputs• Translation• Class label

João Pereira 7

Page 8: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

What is learning?

• Learning or training refers to estimate the parameters of a model.

João Pereira 8

Training Validation TestDataset

Page 9: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Machine learning is about generalizing to unseen data.

João Pereira 9

Page 10: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

What data is out there?

João Pereira 10

Unlabeled data

Labeled data

Cost

Availability

Abundant

Accessible

Scarce

Expensive

Page 11: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Data in the PDEng Data Science

João Pereira 11

Unlabeled data

Labeled data

Cost

Availability

Abundant

Accessible

Scarce

Expensive

Page 12: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Types of learning

• Supervised learning• Unsupervised learning• Reinforcement learning

João Pereira 12

Page 13: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

LeCun’s Cake Analogy

João Pereira 13

Page 14: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Supervised Learning

• Given a dataset D of inputs x and labeled targets y, learn to predict y from x.

• Most successful paradigm in deep learning.

João Pereira 14

ModelDogCat

Dog Cat

Page 15: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Unsupervised Learning

• Given only the inputs , models and find

João Pereira 15

Model

PatternsStructureClusters

Generation

Page 16: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why Did Deep Learning Become So Popular?

Important breakthroughs in AI-class of problems

• Speech recognition• Image classification• Machine translation• Chatbots• Self-driving cars• Playing games

João Pereira 16

Page 17: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why Now?

Among other reasons:• Fifty years of AI research in machine learning• Explosion of convenient compute power

(e.g., GPUs, TPUs)• Lots of (labeled) data• New software and tools• Strong investments and interest of large organizations

João Pereira 17

Page 18: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Cepticism and AI winters

João Pereira 18

Link

Page 19: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

João Pereira 19

Yann LeCunNew York University

Facebook AI Research

Turing Award

Yoshua BengioUniversité de Montreal

Geoffrey HintonUniversity of Toronto

Google Brain

Deep Learning Heroes

Page 20: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why Deep Learning?

João Pereira 20

Input OutputFeature Extraction Classification

DuckNot a Duck

Classifier

Input Output

DuckNot a Duck

Traditional Machine Learning

Deep Learning

Feature Extraction + Classification

Deep Neural Network

Page 21: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why Deep Learning?

• Hand-crafting features is time consuming and not scalable.• We can learn the underlying features from data, automatically.

João Pereira 21

Zeiler & Fergus, 2013Example by Yann LeCun

Deep Neural Network

Page 22: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Deep learning is about representation learning!Learning good features for downstream tasks (regression, classification, …)Hand-crafting vs learning features

João Pereira 22

Page 23: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Historical Perspective

João Pereira 23

RosenblattPerceptron

1957

Minsky & PapertAI Winter begins CNNs

19891986

Backpropagation LSTMs

1995 1997

SVMs AEs

2006 2012

ImageNetContest

1969

VAEsGANs

2014

Page 24: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

PerceptronStructural building block of deep learning

João Pereira 24

Page 25: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

The Perceptron

João Pereira 25

Input Weights Sum Non-Linearity Output

OutputPrediction Non-linearity Bias Inputs Weights

Rosenblatt, 1958

Page 26: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

The Perceptron

João Pereira 24

Input Weights Sum Non-Linearity Output

In matrix form:

Page 27: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Activation FunctionsSigmoidHyperbolic TangentRectified Linear UnitSoftmax

João Pereira 27

Page 28: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Activation Functions

João Pereira 28

Recap…Activation function

Page 29: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Activation Functions

• The goal of the activation functions is to introduce non-linearities into the network

• The are many of these functions• They are all non-linear

João Pereira 29

Page 30: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Activation Functions

João Pereira 30

Rectified Linear UnitReLU Sigmoid

1

tf.keras.activations.sigmoid(x)tf.keras.activations.relu(x)

Hyperbolic Tangent

1

tf.keras.activations.tanh(x)

Page 31: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Activation Functions: Softmax

João Pereira 31

Neural Network

TargetsProbabilitiesCat

Dog

Output

Softmax

tf.keras.activations.softmax(x)

Page 32: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Building Neural Networks with Perceptrons

João Pereira 32

Page 33: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Single-Layer Neural Network

João Pereira 33

Output LayerInput Layer Hidden Layer

Page 34: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Properties

Neural networks are universal function approximators

Universal Approximation Theorem

A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.

João Pereira 34

Cybenko, 1989

Page 35: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Properties

Neural networks are universal function approximators

Universal Approximation Theorem

A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.

João Pereira 35

Cybenko, 1989

In a nutshell:They can approximate many functions up to any precision.

Page 36: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Training Neural NetworksLoss FunctionsGradient DescentStochastic Gradient Descent

João Pereira 36

Page 37: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Example: Wine Quality

João Pereira 37

Volatile Acidity

Alcohol

Good

Not Good

Page 38: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Example: Wine Quality

• How to quantify prediction error?

João Pereira 38

Prediction Target

Good (1)

Not Good (0)

Page 39: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Loss

João Pereira 39

Prediction Target

The loss measures the cost due to wrong predictions.

Page 40: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Loss

João Pereira 40

Target

This is the empirical loss.Aka

Objective functionCost functionEmpirical risk

Prediction Dataset

Page 41: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Binary Cross Entropy Loss

• Used in classification problems with models that output a probability between 0 and 1.

João Pereira 41tf.keras.losses.BinaryCrossentropy()

Prediction Target

Page 42: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Mean Squared Error Loss

• Used in regression problems with models that output continuous real numbers.

João Pereira 42

Prediction Target

tf.keras.losses.MSE()

Wine Quality [%]

Page 43: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Unsupervised Deep Learning

João Pereira 43

Page 44: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Autoencoders

• Most used model for representation learning.

João Pereira 44

Encoder Decoder

Unsupervised Learning

Latent space

Page 45: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Autoencoders

João Pereira 45

Reducing the Dimensionality of Data with Neural Networks, Hinton & Salakhutdinov, 2006

Page 46: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

An autoencoder with linear activations does principal component analysis!

João Pereira 46

Page 47: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Unsupervised Representation Learning

Tip #1: Always “look” at your data before designing your model!

João Pereira 47

Based on Marc’Aurelio Ranzato´s (Facebook AI Research) tutorial on Unsupervised Deep Learning, NeurIPS’18

Mean & Covariance analysisPCA (check eigenvalue decay)t-sne visualization

Page 48: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Unsupervised Representation Learning

Tip #1: Always “look” at your data before designing your model!

Learned features may be useful for down-stream tasks.(e.g., classification, regression, …)

João Pereira 48

Based on Marc’Aurelio Ranzato´s (Facebook AI Research) tutorial on Unsupervised Deep Learning, NeurIPS’18

Mean & Covariance analysisPCA (check eigenvalue decay)t-sne visualization

Representation learning using unsupervised learning

Page 49: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Unsupervised Representation Learning

Tip #2: PCA and K-Means are very often a strong baseline.

João Pereira 49

Based on Marc’Aurelio Ranzato´s (Facebook AI Research) tutorial on Unsupervised Deep Learning, NeurIPS’18

Page 50: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Coffee Break16:14 + 15min

João Pereira 50

Page 51: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Optimizing the Loss

We want to find the neural network weights that achieve the lowest loss

Or simply

João Pereira 51

Weights

Training Examples

Based on first class of MIT 6.S191 by Alexander Amini

Page 52: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Optimizing the Loss

João Pereira 52

Random initialization

Page 53: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Optimizing the Loss

João Pereira 53

Compute gradient

Page 54: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Optimizing the Loss

João Pereira 54

Compute gradient

Page 55: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Optimizing the Loss

João Pereira 55

Take a small step in the opposite direction of the gradient

Page 56: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

“It's like walking in the mountains in a fog and following the direction of steepest descent to reach the village in the valley”

João Pereira 56

Yann LeCun

Page 57: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Gradient DescentThis procedure is called gradient descent.

1. Initialize weights randomly.

2. Repeat until convergence.

2.1. Computer gradient

2.2. Update weights

3. Return weights.

João Pereira 57

Page 58: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Stochastic Gradient Descent

1. Initialize weights randomly.

2. Repeat until convergence.

2.1. Choose 1 datapoint randomly

2.2. Computer gradient

2.3. Update weights

3. Return weights.

João Pereira 58

tf.keras.optimizer.SGD(x)

Page 59: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Stochastic Gradient Descent

1. Initialize weights randomly.

2. Repeat until convergence.

2.1. Choose mini-batch of M samples randomly

2.2. Computer gradient

2.3. Update weights

3. Return weights.

João Pereira 59

tf.keras.optimizer.SGD(x)

Page 60: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Stochastic Gradient Descent

Many variants of SGDAdamAdaGradAdaDeltaRMSProp

Mini-batch training is faster (parallel computation on GPUs).

João Pereira 60

tf.keras.optimizer.Adam()

tf.keras.optimizer.RMSProp()

tf.keras.optimizer.Adagrad()

tf.keras.optimizer.AdaDelta()

Page 61: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Backpropagation

How to compute ?

• Employs the chain rule for deriving the gradient.• There are tools that compute these gradients automatically.• Check Hinton’s course for details.• Very good interactive explanation of backpropagation.

João Pereira 61

https://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/

Page 62: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

In real-life deep learning...

João Pereira 62

Training neural networks is hard!

Page 63: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Learning Rate

João Pereira

How to choose ?

Page 64: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Learning Rate

• Small learning makes learning slow and gets stuck in local minima.

• Large learning rates overshoot, are unstable and diverge.

Tips:• Try many learning rates and choose the best• Adaptive learning rate

• How large gradient is…• How fast learning occurs…

João Pereira 64

Page 65: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Optimization ChallengesOverfitting and UnderfittingBias-Variance Trade-Off

João Pereira 65

Page 66: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Bias-Variance Trade-off

66

Overfitting

Model Complexity

ErrorUnderfitting

Bias2Test Error

VarianceTrain Error

Page 67: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

RegularizationDropoutEarly Stopping

João Pereira 67

Page 68: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why do we need regularization?

• To make our models generalize better to unseen data.• We are going to look at two strategies:

• Dropout• Early stopping

João Pereira 68

Page 69: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Dropout

João Pereira 69

Randomly set activations to zero.Typically around 50%.

Page 70: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Dropout

João Pereira 70

Randomly set activations to zero.Typically around 50%.

tf.keras.layers.Dropout(p)

Page 71: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Early Stopping

• Early stop training before overfitting.

João Pereira 71Training Steps

LossTest

Train

OverfittingUnderfitting

Page 72: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Generalization in Deep Learning

João Pereira 72

Page 73: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Generalization in Deep Learning

João Pereira 73

Understanding Deep Learning Requires Rethinking Generalization, Zhang et al., 2017

Large models overfit very little ???

Page 74: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Generalization in Deep Learning

João Pereira 74

Reconciling modern machine-learning practice and the classical bias–variance trade-off, Belkin et al., 2019

The bias-variance trade-off was just the first episode of the series!

Page 75: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Bias-Variance Trade-off

On Deep Learning – João Pereira 75

Overfitting

Number parameters

Error

Training

Test

Page 76: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Part 2Convolutional Neural NetworksRecurrent Neural Networks

João Pereira 76

Page 77: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Motivation

• In feed-forward neural networks we assume data is independent and identically distributed in space or in time.

• In many applications, data has spatial or temporal dependencies.

João Pereira 77

Spatial

Convolutional Neural Networks

Temporal

Recurrent Neural Networks

Page 78: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Convolutional Neural Networks

João Pereira 78

Page 79: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Images

• An image is a matrix of numbers.

João Pereira 79

2 4 3 2 4 3 ... 33 8 5 3 8 5 71 2 1 1 2 1 52 4 3 2 4 3 23 8 5 3 8 5 11 2 1 1 2 1 3... ... 45 6 8 2 4 6 8 9

Values between 0 and 255

200

200200x200=40k pixels

Page 80: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Images

• An image is a matrix of numbers.

João Pereira 80

2 4 3 2 4 33 8 5 3 8 51 2 1 1 2 12 4 3 2 4 33 8 5 3 8 51 2 1 1 2 1...8 4 2 1 7 9 9 7

Values between 0 and 255

5 4 3 2 4 36 8 5 3 8 59 2 1 1 2 12 4 3 2 4 34 8 5 3 8 57 2 1 1 2 143 8 5 3 8 5 1

2 4 3 2 4 3 ... 33 8 5 3 8 5 71 2 1 1 2 1 52 4 3 2 4 3 23 8 5 3 8 5 11 2 1 1 2 1 3... ... 45 6 8 2 4 6 8 9

200 x 200 x 3 = 120k

RB

G

Page 81: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Tasks in Computer Vision

RegressionOutput is a continuous value.

ClassificationOutput is a class.

Applications:Classification, segmentation, localization, detection.

João Pereira 81

Page 82: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Examples

João Pereira 82Slide from Stanford CS231N, 2017, Lecture 11

Page 83: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why do we need convolutional neural networks?

João Pereira 83

Page 84: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

João Pereira 84

200

20040k

Requires1.6 billion parameters

Why do we need convolutional neural networks?

Fully connected layer

Page 85: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

João Pereira 85

200

20040k

Why do we need convolutional neural networks?

Locally connected layer

Requires4 million parameters

10x10

10x10

10x10

Page 86: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Convolutional Layer Idea

João Pereira 86

200

20040k

Use multiple filters to extract different kinds of features.

Spatially share the same parameters across different locations.

Intuition: important features can happen anywhere in the image.

Page 87: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Idea

João Pereira 87

Page 88: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Idea

João Pereira 88

Page 89: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Idea

João Pereira 89

Page 90: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

What is a convolution?

1 0 10 0 11 0 1

João Pereira 90

2 4 3 5 6 83 8 5 7 5 11 2 1 5 6 72 4 6 2 3 43 5 7 2 5 66 8 9 7 9 8

FilterKernel

Feature Detector

Feature MapActivation Map

Convolved Feature

=

Input Image

Page 91: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

What is a convolution?

1 0 10 0 11 0 1

João Pereira 91

2 4 3 5 6 83 8 5 7 5 11 2 1 5 6 72 4 6 2 3 43 5 7 2 5 66 8 9 7 9 8

FilterKernel

Feature Detector

Feature MapActivation Map

Convolved Feature

=12 0 1 0

0 0 0 0

1 1 1 0

1 0 1 1

Input Image2x1 + 4x0 + 3x1 + 5x1 + 8x0 + 3x0 + 1x1 + 2x0 + 1x1 = 12

Page 92: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Convolutional Layer

• Weights of the filters are learned during training• We need to define:

• Number of filters• Filter dimensions• Padding• Stride

João Pereira 92

1 0 10 0 11 0 1

Page 93: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Padding

João Pereira 93

0 0 0 0 00 0 1 2 00 3 4 5 00 6 7 8 00 0 0 0 0

0 12 3⨀

0 3 8 4

9 19 25 10

21 37 43 16

6 7 8 0

Padding with zeros

=

• Used to avoid loosing pixels around the image.

2 4 33 8 51 2 1

Page 94: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Stride

• Is the number of rows and columns traversed per slide.

João Pereira 94

0 0 0 0 00 0 1 2 00 3 4 5 00 6 7 8 00 0 0 0 0

0 12 3⨀ =

2 4 33 8 51 2 1

0 86 8

Vertical: stride 3Horizontal: stride 2

Page 95: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

In a nutshell...

Feed-forward neural networks applied to images:• Scale quadratically with the input sizeSolution: Use a convolutional layer• Connect each hidden unit to a small patch of the input.• Share weights across space.A network built with these structure is called a Convolutional Neural Network (CNN)

João Pereira 95

LeCun et al., 1998, Gradient-based learning applied to document recognitionBased on Marc'Aurelio Ranzato’s slides, Facebook AI Research, Image Classification with Deep Learning

Page 96: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Non-Linearity

• Introduce non-linearities after convolutional layers.• The rectified linear unit (ReLU) works very well.

João Pereira 96

tf.keras.activations.relu(x)

Replaces negative values by zero.

Page 97: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Pooling Layer

• Pooling promotes robustness to the specific location of the features.

Example:

Max, Avg, L2-pooling, ...

João Pereira 97

2 4 3 5 6 33 8 5 7 5 11 2 1 5 6 72 4 6 2 3 43 5 2 2 5 66 4 5 7 9 8

8 75 9

Max poolingFilters 3x3

Stride 3

Dimension reduced

Page 98: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Representation Learning with CNNs

João Pereira 98

Low Level Features Medium Level Features High Level Features

Conv. Layer 1 Conv. Layer 2 Conv. Layer 3

Page 99: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Progress in CNN Architectures

João Pereira 99

GoogLeNet, 2014Szegedy et al.

VGG, 2014Zisserman, Simonyan

AlexNet, 2012Krizhevsky, Sutskever, and Hinton

ResNet, 2015He

Page 101: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Deep learning is in production

João Pereira 101

Page 102: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Coffee Break16:16 + 15min

João Pereira 102

Page 103: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Recurrent Neural Networks

João Pereira 103

Page 104: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why do we need recurrent neural networks?

In many applications, data has a sequential nature:Sensor time seriesNatural speechWords in sentencesDNAVideos

What if data is not i.i.d. in time?

João Pereira 104

Example:My name is João

Page 105: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Recap

• In feed-forward neural networks:

João Pereira 105

Page 106: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Recurrent Neural Network

João Pereira 106

Feed-forward Neural Network Recurrent Neural Network

Bengio et al., 1994

Page 107: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Recurrent Neural Network

João Pereira 107

RNNs capture the temporal dependencies of the data.

• Real-valued hidden state• Feedback connection• Parameters shared

across timestepsis typically a sigmoid or tanh

Page 108: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

RNN Unrolled

João Pereira 108

Bengio et al., 1994

Page 109: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Notes

• is a vector representation of the entire sequence• Summarizes sequences into fixed-length vectors• Input sequences can have arbitrary length.• Final hidden state can be used as a feature vector for

classification or regression tasks.

João Pereira 109

Page 110: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Long-Short Term Memory Networks

João Pereira 110

Hochreiter & Schmidhuber, 1997

• Proposed to solve the vanishing gradient problem.

• New cell and three gates.

Page 111: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Sequence to Sequence Models (Seq2Seq)

João Pereira 111

Page 112: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Sequence to Sequence Models

• In some applications, mapping sequences to sequences is useful.

Example: machine translation

• A sequence to sequence model is an encoder-decoder model.

João Pereira 112

My name is João Mijn naam is João

Page 113: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Sequence to Sequence Models

João Pereira 113

Learned representation

Encoder

Decoder

My name is João

Mijn naam is João

Page 114: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Attention

João Pereira 114

Page 115: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Why do we need attention?

Problem: Fixed-size representation z is the bottleneck of seq2seq.Question: Is there a better way to pass the information from theencoder to the decoder?Solution: Attention Mechanism

João Pereira 115

Bahdanau, Cho, and Bengio, Neural Machine Translation by Jointly Learning to Align and TranslateLuong, Pham, and Manning, Effective Approaches to Attention-based Neural Machine Translation

Page 116: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Attention Mechanism

João Pereira 116

My name is João

Context Vector

Page 117: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Sequence to Sequence + Attention

João Pereira 117

My name is João

Context Vector Decoder

Mijn naam is João

Encoder

Page 118: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Application Examples

Google´s Neural Machine Translation System

João Pereira 118

Page 119: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Application Examples

Question Answering

João Pereira 119

Page 120: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

What was missing?

João Pereira 120

Page 121: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Probabilistic deep learning models• Deep generative models: variational autoencoders and

generative adversarial networks

João Pereira 121

Page 122: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

References – Part 1

Papers/Books• A Few Useful Things to Know About Machine Learning,

Domingos, 2012 (Link)• Deep Learning, Ian Goodfellow et al. (Link)

Tutorials• Tensorflow and Deep Learning Without a PhD (Link)• Unsupervised Deep Learning, NeurIPS18 (Link)

João Pereira 122

Page 123: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

References – Part 1

Courses• Neural Networks for Machine Learning, Geoffrey Hinton, (Link)• EE-559 Deep Learning, EPFL (Link)• 6.S191 Deep Learning, MIT (Link)• CS230 Deep Learning, Stanford, Andrew Ng (Link)• Deep Learning, NYU, Yann LeCun (Link)

João Pereira 123

Page 124: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

CNN References – Part 2

Courses/Tutorials• Deep Learning for Computer Vision MIT 6.S191 (Link)• Convolutional Neural Networks Lecture, Stanford (Link)• Image Classification with Deep Learning (Link)• Deep Learning And The Future of AI, Yann LeCun (Link)• Deep dive into deep learning (Link)VideoDeep Learning and the Future of Artificial Intelligence (Link) How does the brain learn so much so quickly? (Link)

João Pereira 124

Page 125: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

RNN References – Part 2

Courses/Tutorials• LxMLS slides, Chris Dyer, DeepMind (Link)• Deep Sequence Modeling MIT 6.S191 (Link)

VideoDLRLSS, Recurrent Neural Networks, Yoshua Bengio (Link)

João Pereira 125

Page 126: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Questions?

João Pereira 126

[email protected]

Page 127: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep

Thank you for your attention!

João Pereira 127