deep learning in theano

Post on 22-Jan-2017

559 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Learning in TheanoMassimo Quadrana

PhD Student @ Politecnico di MilanoResearch Intern @ Telefonica I+D

massimo.quadrana@polimi.it @mxqdrOriginal slides are available here: https://goo.gl/VLYsnR

Before startingOS: Linux / Mac OS (sorry Windows guys :) )

Required software:

python 2.7x, git, openblas

Optional software (for faster math and better packages/virtualenv support):

Anaconda (https://www.continuum.io/downloads)

Anaconda Intel MKL (free student licence) (https://www.continuum.io/anaconda-academic-subscriptions-available)

Before startingOpen your terminal and create a new virtualenv

> virtualenv -p /usr/bin/python2.7 theano-env

Activate the virtualenv

> source theano-env/bin/activate

Install the Theano package with dependences

> pip install Theano

(To exit the the virtualenv)

> deactivate

Before startingTo check if your Theano env is correctly configured run the following

python -c 'import theano'

It should complete without errors

Before startingGet the lab code here

> git clone https://github.com/mquad/DNN_Lab_UPF

Structure of the repo:

● exercise/: directory with the code for the lab (it won’t run)● complete/: directory with the code completed with the missing parts (it should

run :-) )● notebooks/: some jupyther notebooks to show you some cool stuff

If you spot any error, or you have any feature request, open a new issue. I’ll do my best to maintain the repo up-to-date :-)

OutlineImage classification

● Logistic Regression● “Modern” Multi-layer NN● Convolutional Neural networks

Sequence Modeling

● Character Based RNN

Open your editor and write the following. Save it as example_mul.py, then run python example_mul.py

Theano intro

The official documentation:

http://deeplearning.net/software/theano/index.html

Theano intro

MNIST Dataset60000 grayscale images (28 x 28 pixels each)

10 classes

8

Inputs Computation Outputs

Model

Logistic Regression on MNIST

0.1

T.dot(X, W) + b

softmax(X)

0. 0.10. 0.0. 0.0. 0.10.7

Zero One Two Three Four Five Six Seven Eight Nine

Logistic Regression on MNISTOpen exercise/logreg_raw.py

Many parts have already been coded for you (library import, data import and split, evaluation)

Write the code for the Logistic Regression classifier

LogReg: input vars and model parameters

Shared variables in Theano maintain their state across functions.

Use them to store your model’s parameters.

If execute on GPU, shared variables are stored into the GPU memory for faster access.

LogReg: model and cost function

Softmax: generalization of sigmoid over multiple classes

Predicted class: class with maximum expected probability

LogReg: model and cost function

Cross-entropy loss

y one-hot encoding of the correct class of input features x (y_i = 1 iif class of x is i)

here we keep y integer, and use indexing of y_hat to save computations

Note: average loss over the minibatch (the cost must be scalar)

LogReg: SGD

T.grad() does the automatic differentiation of the loss function

updates tells Theano how to update the model (shared) parameters (it can be a list of tuples, a dict or OrderedDict)

LogReg: Training, Loss and Predictions

LogReg: Softmax

exp function can easily overflow: subtract by the maximum x value to get more stable results (without any effects on correctness)

LogReg: File logreg.py contains a cleaner version of the Logistic Regression classifier.

init(): defines model parameters

model(): defines our model

fit() and predict(): fits the model on training data and predict the class given new data

Logistic Regression on MNIST

0.1

T.dot(X, w)

softmax(X)

0. 0.10. 0.0. 0.0. 0.10.7

Zero One Two Three Four Five Six Seven Eight Nine

Test accuracy: ~92%

“Modern” multi-layer network0.0

h0 = relu(T.dot(X, Wh0) + b0)

y = softmax(T.dot(h1, Wy) + by)

0. 0.10. 0.0. 0.0. 0.0.9

Zero One Two Three Four Five Six Seven Eight Nine

h1 = relu(T.dot(h0, Wh1) + b1)

Noise

Noise

Noise(or augmentation)

“Modern” multi-layer networkOpen and complete mlp.py. The missing parts are:

● init(): initialize the MLP parameters● model(): define the model using dropouts● dropout(): apply dropout to the input● apply_momentum(): apply momentum over the given updates

MLP: init()

MLP: model()

MLP: dropout()

MLP: apply_momentum()

“Modern” multi-layer network0.0

h0 = relu(T.dot(X, Wh0) + b0)

y = softmax(T.dot(h1, Wy) + by)

0. 0.10. 0.0. 0.0. 0.0.9

Zero One Two Three Four Five Six Seven Eight Nine

h1 = relu(T.dot(h0, Wh1) + b1)

Noise

Noise

Noise(or augmentation)

Test accuracy: ~98%

Convolutional Neural Networks

from deeplearning.net

CNNs in TheanoOpen convnet.py and complete the following parts

● get_conv_output_shape(): compute the output shape of the convolutional layer

● init(): complete the initialization of the convolutional filters● model(): define entirely the cnn model● adagrad(): define the update rules for adagrad● rmsprop(): define the update rules for rmsprop (easy if you do adagrad first)

Dealing with ConvolutionsInputs have 3 dimensions:

width, height (spatial dimensions W)

and depth

Convolutions are

● local in width and height (receptive field F)● full in depth

Dealing with ConvolutionsConvolution hyper-parameters

● depth: number of neurons connected to the same input region● stride: space btw depth columns in the spatial dimensions● padding: how to treat borders (not covered in the examples)

The spatial size of the output volume is given by the formula

(W - F + 2P) / S + 1

Our CNN (variation of LeNet5)

INPUT, CONV(5,5)*, MAX POOL, CONV(5,5)*, MAX POOL, FC*

*The actual number of filters and Fully Connected layers is programmable

CNNs: get_conv_output_volume()

We don’t consider padding for simplicity

CNNs: init()First CONV(5,5), MAX POOL

CNNs: init()Analogously for the second CONV(5,5), MAX POOL

CNNs: model()

CNNs: adagrad()

CNNs: rmsprop()

Convolutional Neural Networks

from deeplearning.net

Test accuracy: 99.5%

SGD/Adagrad/Rmsprop in training convnets

Recurrent Neural Networks

Recurrent Neural NetworksOpen char_rnn/char_rnn_vanilla.py and complete the following

● init(): define and initialize the parameters of the Vanilla RNN● model(): compute the updates of hidden states of the RNN● model_sample(): compute the updates of the hidden state of the RNN after

only one step

RNN: init()

RNN: model()theano.scan() defines symbolic loops in Theano.

It has 4 main arguments (+ several additional ones):

● fn: function to be applied at every iteration● sequences: variables scan has to iterate over (iteration is done over the first

dimension of each variable)● outputs_info: initial state of the of the outputs computed recurrently● non_sequences: list of additional arguments passed to fn

At each iteration, fn receives the parameters in the following order:

sequences (if any), outputs_info (if needed), non_sequences (if any)

RNN: model()

RNN: model_sample()

RNN - LSTM

RNN - LSTMUnder the complete/ folder you have the code for the LSTM version of char-rnn

● char_rnn_lstm.py: standard LSTM● char_rnn_lstm_fast.py: fast LSTM, makes better usage of vectorized

operations (>2x faster)● sampler.py: to sample from your RNN

EXERCISE: They differ from VanillaRNN in their init(), model(), model_sample() and sampler() methods. Try to figure out how to pass from one model to the other.

Additional remarksHow to choose the optimal hyperparameters of my DNN?

● Grid search (overly expensive)● Bayesian Optimization (effective but quite complex)● Random search (cheap, effective and easy to implement)

Check out mlp_opt.py to run random

hyperparameter search for the MLP.

EXERCISE: Try with CNNs, RNNs

Additional Remarks (2)Packages worth checking

● Built on-top of Theano: Lasagne, Keras● Standalone packages: Caffe (Berkely), Tensorflow (Google), CNTK

(Microsoft)

Repositories

● gitxiv.com

CreditsThe slides and code used in this lab were inspired by some great works done by some great Deep Learning researchers

Alec Recford’s slides: “Introduction to Deep Learning with Python”, http://www.slideshare.net/indicods/general-sequence-learning-with-recurrent-neural-networks-for-next-ml

Andrey Karpathy’s blog post “The unreasonable effectiveness of Recurrent Neural Networks”, http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Andrey Karpathy’s char-rnn repo, https://github.com/karpathy/char-rnn

top related