deep learning: theory, history, state of the art & practical tools

by Ilya Kuzovkin [email protected]

Machine Learning Estonia 2016

Deep LearningTHEORY, HISTORY,

STATE OF THE ART & PRACTICAL TOOLS

http://neuro.cs.ut.ee

Page 3: Deep Learning: Theory, History, State of the Art & Practical Tools

Where it has started

How it evolved

What is the state now

How can you use it

How it learns

Page 4: Deep Learning: Theory, History, State of the Art & Practical Tools

Where it has started

Page 5: Deep Learning: Theory, History, State of the Art & Practical Tools

Where it has startedArtificial Neuron

Page 6: Deep Learning: Theory, History, State of the Art & Practical Tools

McCulloch and Pitts“A Logical Calculus of the Ideas Immanent in Nervous Activity”

1943

Page 7: Deep Learning: Theory, History, State of the Art & Practical Tools

McCulloch and Pitts“A Logical Calculus of the Ideas Immanent in Nervous Activity”

1943

Page 8: Deep Learning: Theory, History, State of the Art & Practical Tools

Where it has startedPerceptron

Frank Rosenblatt1957

Page 9: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 10: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 11: Deep Learning: Theory, History, State of the Art & Practical Tools

“[The Perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write,

reproduce itself and be conscious of its existence.”THE NEW YORK TIMES

Page 12: Deep Learning: Theory, History, State of the Art & Practical Tools

Where it has startedSigmoid & Backpropagation

Page 13: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 14: Deep Learning: Theory, History, State of the Art & Practical Tools

Measure how small changes in weights affect outputCan apply NN to regression

Page 15: Deep Learning: Theory, History, State of the Art & Practical Tools

(Werbos) Rumelhart, Hinton, Williams(1974) 1986

“Learning representations by back-propagating errors” (Nature)

Measure how small changes in weights affect outputCan apply NN to regression

Page 16: Deep Learning: Theory, History, State of the Art & Practical Tools

(Werbos) Rumelhart, Hinton, Williams(1974) 1986

“Learning representations by back-propagating errors” (Nature)

Measure how small changes in weights affect output

Multilayer neural networks, etc.

Can apply NN to regression

Page 17: Deep Learning: Theory, History, State of the Art & Practical Tools

Where it has startedWhy DL revolution did not happen in 1986?

FROM A TALK BY GEOFFREY HINTON

Page 18: Deep Learning: Theory, History, State of the Art & Practical Tools

• Not enough data (datasets were 1000 times too small)

!

• Computers were too slow (1,000,000 times)

Page 19: Deep Learning: Theory, History, State of the Art & Practical Tools

• Not enough data (datasets were 1000 times too small)

!

• Computers were too slow (1,000,000 times) !

• Not enough attention to network initialization !

• Wrong non-linearity

Page 20: Deep Learning: Theory, History, State of the Art & Practical Tools

How it learns

Page 21: Deep Learning: Theory, History, State of the Art & Practical Tools

How it learnsBackpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT

Page 22: Deep Learning: Theory, History, State of the Art & Practical Tools

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

Page 23: Deep Learning: Theory, History, State of the Art & Practical Tools

1. The Forward Pass — Calculating the total error

NET OUT

Page 24: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 25: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 26: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 27: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Repeat for h2 = 0.596, o1 = 0.751, o2 = 0.773

Page 28: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

We have o1, o2

Page 29: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

We have o1, o2

Page 30: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

We have o1, o2

Page 31: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

We have o1, o2

Page 32: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

We have o1, o2

Page 33: Deep Learning: Theory, History, State of the Art & Practical Tools

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights

NET OUT

Page 34: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 35: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 36: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 37: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 38: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 39: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 40: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 41: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 42: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 43: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 44: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 45: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 46: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 47: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 48: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 49: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 50: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Learning rate

Page 51: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Gradient descent update rule

Learning rate

Page 52: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

Page 53: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

• Repeat for w6, w7, w8

Page 54: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

• Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4

Page 55: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

• Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again:

it was:0.2910279240.298371109

Page 56: Deep Learning: Theory, History, State of the Art & Practical Tools

NET OUT

• Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again:

it was: !

• Repeat 10,000 times:

0.2910279240.298371109

0.000035085

Page 57: Deep Learning: Theory, History, State of the Art & Practical Tools

How it learnsOptimization methods

Alec Radford “Introduction to Deep Learning with Python”

Page 58: Deep Learning: Theory, History, State of the Art & Practical Tools

How it learnsOptimization methods

Page 59: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolved

Page 60: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolved1-layer NN

INPUT OUTPUTAlec Radford “Introduction to Deep Learning with Python”

Page 61: Deep Learning: Theory, History, State of the Art & Practical Tools

92.5% on the MNIST test set

Page 62: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 63: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedOne hidden layer

Page 64: Deep Learning: Theory, History, State of the Art & Practical Tools

98.2% on the MNIST test set Alec Radford “Introduction to Deep Learning with Python”

Page 65: Deep Learning: Theory, History, State of the Art & Practical Tools

98.2% on the MNIST test set Activity of a 100 hidden neurons (out of 625)

Page 66: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedOverfitting

Page 67: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedDropout

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

Page 68: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 69: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 70: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 71: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedReLU

X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

Page 72: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedReLU

X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

Page 73: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolved“Modern” ANN

• Several hidden layers • ReLU activation units • Dropout

Page 74: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 75: Deep Learning: Theory, History, State of the Art & Practical Tools

Page 76: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedConvolution

Page 77: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector

Page 78: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector

Page 79: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector

Page 80: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector

40 40 40

10 10 10

Page 81: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector

+1 +1 +1

0 0 0

-1 -1 -1

40 40 40

10 10 10

Page 82: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector

+40 +40 +40

0 0 0

-40 -40 -40

10 10 10

Page 83: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector0

+40 +40 +40

0 0 0

-40 -40 -40

10 10 10

Page 84: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector0

90

0

40 40 40

10 10 10

Page 85: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector0

90

0

40 40 40

10 10 10

Page 86: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector0

90

0

40 40 40

10 10 10

Page 87: Deep Learning: Theory, History, State of the Art & Practical Tools

+1 +1 +1

0 0 0

-1 -1 -1

Prewitt edge

detector0

90

0

40 40 40

10 10 10

Edge detector is a handcrafted feature detector.

Page 88: Deep Learning: Theory, History, State of the Art & Practical Tools

The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones

Page 89: Deep Learning: Theory, History, State of the Art & Practical Tools

http://yann.lecun.com/exdb/mnist/

Page 90: Deep Learning: Theory, History, State of the Art & Practical Tools

99.50% on the MNIST test set CURRENT BEST: 99.77% by committee of 35 conv. nets

http://yann.lecun.com/exdb/mnist/

Page 91: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedMore layers

Page 92: Deep Learning: Theory, History, State of the Art & Practical Tools

C. Szegedy, et al., “Going Deeper with Convolutions”, 2014

Page 93: Deep Learning: Theory, History, State of the Art & Practical Tools

C. Szegedy, et al., “Going Deeper with Convolutions”, 2014

ILSVRC 2015 winner — 152 (!) layers

K. He et al., “Deep Residual Learning

for Image Recognition”,

2015

Page 94: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedHyperparameters

• Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization

• Convolutional layers: • size • stride • number of filters

• Optimization method: • learning rate • other method-specific

constants • …

Page 95: Deep Learning: Theory, History, State of the Art & Practical Tools

constants • …

Grid search :(

Page 96: Deep Learning: Theory, History, State of the Art & Practical Tools

constants • …

Grid search :(

Random search :/

Page 97: Deep Learning: Theory, History, State of the Art & Practical Tools

constants • …

Grid search :(

Random search :/

Bayesian optimization :)

Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

Page 98: Deep Learning: Theory, History, State of the Art & Practical Tools

constants • …

Grid search :(

Random search :/

Bayesian optimization :)

Informal parameter search :)Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

Page 99: Deep Learning: Theory, History, State of the Art & Practical Tools

How it evolvedMajor Types of ANNs

feedforward convolutional

Page 100: Deep Learning: Theory, History, State of the Art & Practical Tools

recurrent

Page 101: Deep Learning: Theory, History, State of the Art & Practical Tools

recurrent autoencoder

Page 102: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state now

Page 103: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state nowComputer vision

http://cs.stanford.edu/people/karpathy/deepimagesent/

Kaiming He, et al. “Deep Residual Learning for Image Recognition” 2015

Page 104: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state nowNatural Language Processing

speech recognition + translation Facebook bAbi dataset: question answering

http://smerity.com/articles/2015/keras_qa.html

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 105: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state nowAI

DeepMind’s DQN

Page 106: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state nowAI

Sukhbaatar et al. “MazeBase: A Sandbox for Learning from Games”, 2015DeepMind’s DQN

Page 107: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state nowNeuroscience

Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015

Page 108: Deep Learning: Theory, History, State of the Art & Practical Tools

What is the state nowNeuroscience

Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015

Page 109: Deep Learning: Theory, History, State of the Art & Practical Tools

How can you use it

Page 110: Deep Learning: Theory, History, State of the Art & Practical Tools

How can you use itPre-trained models

Page 111: Deep Learning: Theory, History, State of the Art & Practical Tools

• Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model !!!!• … and use it in your application

https://github.com/BVLC/caffe/wiki/Model-Zoo

Page 112: Deep Learning: Theory, History, State of the Art & Practical Tools

• Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model !!!!• … and use it in your application • Or … !!!

Page 113: Deep Learning: Theory, History, State of the Art & Practical Tools

• Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model !!!!• … and use it in your application • Or … !!!!• … use part of it as the starting point for your model

. . .

Page 114: Deep Learning: Theory, History, State of the Art & Practical Tools

How can you use itZoo of Frameworks

Low-level High-level & Wrappers

Page 115: Deep Learning: Theory, History, State of the Art & Practical Tools

How can you use itKeras

http://keras.io

Page 116: Deep Learning: Theory, History, State of the Art & Practical Tools

http://keras.io

Page 117: Deep Learning: Theory, History, State of the Art & Practical Tools

http://keras.io

Page 118: Deep Learning: Theory, History, State of the Art & Practical Tools

http://keras.io

Page 119: Deep Learning: Theory, History, State of the Art & Practical Tools

http://keras.io

Page 120: Deep Learning: Theory, History, State of the Art & Practical Tools

http://keras.io

Page 121: Deep Learning: Theory, History, State of the Art & Practical Tools

http://keras.io

Page 122: Deep Learning: Theory, History, State of the Art & Practical Tools

• A Step by Step Backpropagation Example http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

• Online book by By Michael Nielsen http://neuralnetworksanddeeplearning.com

• CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/

http://neuralnetworksanddeeplearning.com

http://cs231n.stanford.edu/

deep learning: theory, history, state of the art & practical tools

Science