deep learning in theano

50
Deep Learning in Theano Massimo Quadrana PhD Student @ Politecnico di Milano Research Intern @ Telefonica I+D [email protected] @mxqdr Original slides are available here: https://goo.gl/VLYsnR

Upload: massimo-quadrana

Post on 22-Jan-2017

559 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Deep Learning in theano

Deep Learning in TheanoMassimo Quadrana

PhD Student @ Politecnico di MilanoResearch Intern @ Telefonica I+D

[email protected] @mxqdrOriginal slides are available here: https://goo.gl/VLYsnR

Page 2: Deep Learning in theano

Before startingOS: Linux / Mac OS (sorry Windows guys :) )

Required software:

python 2.7x, git, openblas

Optional software (for faster math and better packages/virtualenv support):

Anaconda (https://www.continuum.io/downloads)

Anaconda Intel MKL (free student licence) (https://www.continuum.io/anaconda-academic-subscriptions-available)

Page 3: Deep Learning in theano

Before startingOpen your terminal and create a new virtualenv

> virtualenv -p /usr/bin/python2.7 theano-env

Activate the virtualenv

> source theano-env/bin/activate

Install the Theano package with dependences

> pip install Theano

(To exit the the virtualenv)

> deactivate

Page 4: Deep Learning in theano

Before startingTo check if your Theano env is correctly configured run the following

python -c 'import theano'

It should complete without errors

Page 5: Deep Learning in theano

Before startingGet the lab code here

> git clone https://github.com/mquad/DNN_Lab_UPF

Structure of the repo:

● exercise/: directory with the code for the lab (it won’t run)● complete/: directory with the code completed with the missing parts (it should

run :-) )● notebooks/: some jupyther notebooks to show you some cool stuff

If you spot any error, or you have any feature request, open a new issue. I’ll do my best to maintain the repo up-to-date :-)

Page 6: Deep Learning in theano

OutlineImage classification

● Logistic Regression● “Modern” Multi-layer NN● Convolutional Neural networks

Sequence Modeling

● Character Based RNN

Page 7: Deep Learning in theano

Open your editor and write the following. Save it as example_mul.py, then run python example_mul.py

Theano intro

Page 8: Deep Learning in theano

The official documentation:

http://deeplearning.net/software/theano/index.html

Theano intro

Page 9: Deep Learning in theano

MNIST Dataset60000 grayscale images (28 x 28 pixels each)

10 classes

8

Inputs Computation Outputs

Model

Page 10: Deep Learning in theano

Logistic Regression on MNIST

0.1

T.dot(X, W) + b

softmax(X)

0. 0.10. 0.0. 0.0. 0.10.7

Zero One Two Three Four Five Six Seven Eight Nine

Page 11: Deep Learning in theano

Logistic Regression on MNISTOpen exercise/logreg_raw.py

Many parts have already been coded for you (library import, data import and split, evaluation)

Write the code for the Logistic Regression classifier

Page 12: Deep Learning in theano

LogReg: input vars and model parameters

Shared variables in Theano maintain their state across functions.

Use them to store your model’s parameters.

If execute on GPU, shared variables are stored into the GPU memory for faster access.

Page 13: Deep Learning in theano

LogReg: model and cost function

Softmax: generalization of sigmoid over multiple classes

Predicted class: class with maximum expected probability

Page 14: Deep Learning in theano

LogReg: model and cost function

Cross-entropy loss

y one-hot encoding of the correct class of input features x (y_i = 1 iif class of x is i)

here we keep y integer, and use indexing of y_hat to save computations

Note: average loss over the minibatch (the cost must be scalar)

Page 15: Deep Learning in theano

LogReg: SGD

T.grad() does the automatic differentiation of the loss function

updates tells Theano how to update the model (shared) parameters (it can be a list of tuples, a dict or OrderedDict)

Page 16: Deep Learning in theano

LogReg: Training, Loss and Predictions

Page 17: Deep Learning in theano

LogReg: Softmax

exp function can easily overflow: subtract by the maximum x value to get more stable results (without any effects on correctness)

Page 18: Deep Learning in theano

LogReg: File logreg.py contains a cleaner version of the Logistic Regression classifier.

init(): defines model parameters

model(): defines our model

fit() and predict(): fits the model on training data and predict the class given new data

Page 19: Deep Learning in theano

Logistic Regression on MNIST

0.1

T.dot(X, w)

softmax(X)

0. 0.10. 0.0. 0.0. 0.10.7

Zero One Two Three Four Five Six Seven Eight Nine

Test accuracy: ~92%

Page 20: Deep Learning in theano

“Modern” multi-layer network0.0

h0 = relu(T.dot(X, Wh0) + b0)

y = softmax(T.dot(h1, Wy) + by)

0. 0.10. 0.0. 0.0. 0.0.9

Zero One Two Three Four Five Six Seven Eight Nine

h1 = relu(T.dot(h0, Wh1) + b1)

Noise

Noise

Noise(or augmentation)

Page 21: Deep Learning in theano

“Modern” multi-layer networkOpen and complete mlp.py. The missing parts are:

● init(): initialize the MLP parameters● model(): define the model using dropouts● dropout(): apply dropout to the input● apply_momentum(): apply momentum over the given updates

Page 22: Deep Learning in theano

MLP: init()

Page 23: Deep Learning in theano

MLP: model()

Page 24: Deep Learning in theano

MLP: dropout()

Page 25: Deep Learning in theano

MLP: apply_momentum()

Page 26: Deep Learning in theano

“Modern” multi-layer network0.0

h0 = relu(T.dot(X, Wh0) + b0)

y = softmax(T.dot(h1, Wy) + by)

0. 0.10. 0.0. 0.0. 0.0.9

Zero One Two Three Four Five Six Seven Eight Nine

h1 = relu(T.dot(h0, Wh1) + b1)

Noise

Noise

Noise(or augmentation)

Test accuracy: ~98%

Page 27: Deep Learning in theano

Convolutional Neural Networks

from deeplearning.net

Page 28: Deep Learning in theano

CNNs in TheanoOpen convnet.py and complete the following parts

● get_conv_output_shape(): compute the output shape of the convolutional layer

● init(): complete the initialization of the convolutional filters● model(): define entirely the cnn model● adagrad(): define the update rules for adagrad● rmsprop(): define the update rules for rmsprop (easy if you do adagrad first)

Page 29: Deep Learning in theano

Dealing with ConvolutionsInputs have 3 dimensions:

width, height (spatial dimensions W)

and depth

Convolutions are

● local in width and height (receptive field F)● full in depth

Page 30: Deep Learning in theano

Dealing with ConvolutionsConvolution hyper-parameters

● depth: number of neurons connected to the same input region● stride: space btw depth columns in the spatial dimensions● padding: how to treat borders (not covered in the examples)

The spatial size of the output volume is given by the formula

(W - F + 2P) / S + 1

Page 31: Deep Learning in theano

Our CNN (variation of LeNet5)

INPUT, CONV(5,5)*, MAX POOL, CONV(5,5)*, MAX POOL, FC*

*The actual number of filters and Fully Connected layers is programmable

Page 32: Deep Learning in theano

CNNs: get_conv_output_volume()

We don’t consider padding for simplicity

Page 33: Deep Learning in theano

CNNs: init()First CONV(5,5), MAX POOL

Page 34: Deep Learning in theano

CNNs: init()Analogously for the second CONV(5,5), MAX POOL

Page 35: Deep Learning in theano

CNNs: model()

Page 36: Deep Learning in theano

CNNs: adagrad()

Page 37: Deep Learning in theano

CNNs: rmsprop()

Page 38: Deep Learning in theano

Convolutional Neural Networks

from deeplearning.net

Test accuracy: 99.5%

Page 39: Deep Learning in theano

SGD/Adagrad/Rmsprop in training convnets

Page 40: Deep Learning in theano

Recurrent Neural Networks

Page 41: Deep Learning in theano

Recurrent Neural NetworksOpen char_rnn/char_rnn_vanilla.py and complete the following

● init(): define and initialize the parameters of the Vanilla RNN● model(): compute the updates of hidden states of the RNN● model_sample(): compute the updates of the hidden state of the RNN after

only one step

Page 42: Deep Learning in theano

RNN: init()

Page 43: Deep Learning in theano

RNN: model()theano.scan() defines symbolic loops in Theano.

It has 4 main arguments (+ several additional ones):

● fn: function to be applied at every iteration● sequences: variables scan has to iterate over (iteration is done over the first

dimension of each variable)● outputs_info: initial state of the of the outputs computed recurrently● non_sequences: list of additional arguments passed to fn

At each iteration, fn receives the parameters in the following order:

sequences (if any), outputs_info (if needed), non_sequences (if any)

Page 44: Deep Learning in theano

RNN: model()

Page 45: Deep Learning in theano

RNN: model_sample()

Page 46: Deep Learning in theano

RNN - LSTM

Page 47: Deep Learning in theano

RNN - LSTMUnder the complete/ folder you have the code for the LSTM version of char-rnn

● char_rnn_lstm.py: standard LSTM● char_rnn_lstm_fast.py: fast LSTM, makes better usage of vectorized

operations (>2x faster)● sampler.py: to sample from your RNN

EXERCISE: They differ from VanillaRNN in their init(), model(), model_sample() and sampler() methods. Try to figure out how to pass from one model to the other.

Page 48: Deep Learning in theano

Additional remarksHow to choose the optimal hyperparameters of my DNN?

● Grid search (overly expensive)● Bayesian Optimization (effective but quite complex)● Random search (cheap, effective and easy to implement)

Check out mlp_opt.py to run random

hyperparameter search for the MLP.

EXERCISE: Try with CNNs, RNNs

Page 49: Deep Learning in theano

Additional Remarks (2)Packages worth checking

● Built on-top of Theano: Lasagne, Keras● Standalone packages: Caffe (Berkely), Tensorflow (Google), CNTK

(Microsoft)

Repositories

● gitxiv.com

Page 50: Deep Learning in theano

CreditsThe slides and code used in this lab were inspired by some great works done by some great Deep Learning researchers

Alec Recford’s slides: “Introduction to Deep Learning with Python”, http://www.slideshare.net/indicods/general-sequence-learning-with-recurrent-neural-networks-for-next-ml

Andrey Karpathy’s blog post “The unreasonable effectiveness of Recurrent Neural Networks”, http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Andrey Karpathy’s char-rnn repo, https://github.com/karpathy/char-rnn