machine learning journal club - uppsala university · deeplearning machine learning journal club...

Deep LearningMachine Learning Journal Club

Carl AnderssonNiklas WahlströmTomas Wilkinsson

Department of Information TechnologyUppsala University

[email protected],[email protected],[email protected] Deep Learning

mailto:[email protected], [email protected], [email protected]

Deep Learning: Motivation

Machine learning influences many aspects of modern society

These application make use of a class of techniques called deeplearning

1 / 41 [email protected],[email protected],[email protected] Deep Learning


Two tasks where Deep Learning shines

Task 1 - Image classification Task 2 - Speech recognition

Input: pixels of an imageOutput: object identity

Model structure:Convolutional neural networks

Input: spoken languageOutput: text

Model structure:Recurrent neural networks



Outline

1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network



Constructing NN for regression

A neural network (NN) is a nonlinear function Y = fθ(X)from an input X to a output Y parameterized by parameters θ.

Linear regression models the relationship between a continuousoutput Y and a continuous input X ,

Y = β0 +

p∑j=1

Xjβj = βTX + ε,

where β is the parameters composed by the “weights” βj and theoffset (“bias”/“intercept”) term βj ,

β =(β0 β1 β2 · · · βp

)T,

X =(1 X1 X2 · · · Xp

)T.



Generalized linear regression

We can generalize this by introducing nonlinear transformations of thepredictor βTX ,

Y = σ(βTX) + ε....

1X1

Xp

σ Y

β0

βp

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.



Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σZ1

Yσ...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z1 = σ(+∑p

j=1Xj

)Z2 = σ

(+∑p

j=1Xj

)...

ZM = σ(+∑p

j=1Xj

)





...

1

X1

Xp

σZ1

Y

σ...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y


Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)

Z2 = σ(+∑p

j=1Xj

)...

ZM = σ(+∑p

j=1Xj

)

Y = β(2)1 Z1





...

1

X1

Xp

σZ1

Yσ

...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y


Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)Z2 = σ

(β(1)02 +

∑p

j=1β(1)j2 Xj

)

...ZM = σ

(+∑p

j=1Xj

)

Y =

2∑m=1

β(2)m Zm





...

1

X1

Xp

σZ1

Yσ...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y


Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)Z2 = σ

(β(1)02 +

∑p

j=1β(1)j2 Xj

)...

ZM = σ(β(1)0M +

∑p

j=1β(1)jMXj

)Y =

M∑m=1

β(2)m Zm





...

1

X1

Xp

σZ1

Yσ...σZM

1

1

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y


Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)Z2 = σ

(β(1)02 +

∑p

j=1β(1)j2 Xj

)...

ZM = σ(β(1)0M +

∑p

j=1β(1)jMXj

)Y = β

(2)0 +

M∑m=1

β(2)m Zm





...

1

X1

Xp

σ

Yσ...σ

1

1

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y


Z = σ(WT1 X + bT1 )

b1 = [ β(1)01 ... β

(1)0M

]

W1 =

β(1)01 ... β

(1)0M

... ......

β(1)p1 ... β

(1)pM

Y = σ(WT

2 Z + bT2 )

b2 = [ β(1)0 ]

W2 =

β(2)0

...β(2)M





...

1

X1

Xp

σ

Yσ...σ

1

1

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y


Z = σ(WT1 X + bT1 )

Y =WT2 Z + bT2





...

1

X1

Xp

σ

σ...σ

Z(1)1

Z(1)2

Z(1)M1

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Hidden units Outputs

Z(1) = σ(WT1 X + bT1 )

Z(2) = σ(WT2 Z

(1) + bT2 )

Y =WT3 Z

(2) + bT3

The model learns better using adeep network (several layers)instead of a wide and shallownetwork.



A 2-layer neural network in matrix notation

Consider N training data points T = {xi, yi}Ni=1. We stack each datapoint i in a row

zT1zT2...zTN

=

σ(xT1W1 + b1)σ(xT2W1 + b1)

...σ(xTNW1 + b1)

yT1yT2...yTN

=

zT1W2 + b2zT2W2 + b2

...zTNW2 + b2

This is how it is written inmatrix form. +b1, +b2 andσ applied on every row.

Z = σ(XW1 + b1)

Y = ZW2 + b2

... and in TensorFlow (popular software package for DL)

Z = tf.nn.sigmoid(tf.matmul(X, W1) + b1)Yhat = tf.nn.matmul(Z, W2) + b2



Training a neural network

• Formulate a cost function, for exampleJ(θ) =

∑Ni=1 ‖yi − fθ(xi)‖2 or J(θ) = −

∑Ni=1 y

Ti log(fθ(xi))

• Minimize with stochastic gradient decent• Gradients can efficiently be computed using back-propagation

Example: Training a five layer network on the MNIST data set



Why now?

Neural networks have been around for more than fifty years. Why havethey become so popular now (again)?

To solve really interesting problems you need:1. Efficient learning algorithms2. Efficient computational hardware3. A lot of labeled data!

These three factors have not been fulfilled to a satisfactory level untilthe last 5-10 years.



Outline

1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network neural network



Convolutional Neural Networks

One of the big recent success stories for neural networks is incomputer vision. Since 2012, neural networks have been used tosome extent in all winning contributions in the largest computer visioncompetitions (ImageNet, MSCOCO, ...)

Recently, medical imaging has seen increased interest from theMachine Learning community (and vice versa) [1]. NN have seensuccess for a few years now [2, 3]

1. Deep Learning for Medical Image Analysis, Zhou et al, 2017

2. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images, Ciresan et al, 2012

3. U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al, 2015



Convolutional Neural Networks

Neural networks are typically called convolutional (CNNs or ConvNets)when they contain one or more convolutional layers.

They work on volumes of data, e.g., images (H, W, 3), where spatialcorrelations exist in the input, and their intermediate representationsare also volumes of data.



Convolutional Layer I

32x32x3 image

5x5x3 filter75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output




32x32x3 image

5x5x3 filter

75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output




32x32x3 image

5x5x3 filter75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output



Convolutional Layer II

32x32x3 image 28x28x1




4x5x5x3 filters

32x32x3 image28x28x4 "image"




32x32x3 image28x28x4

Conv4x5x5x3

Conv10x5x5x4

24x24x10

Relu Relu

...



Convolutional Layer III

How does this relate to regular (fully connected) networks?

1. Local connectivity: Each dot product is computed using only alocal neighborhood of the input (e.g. 5x5 filter)

2. Parameter Sharing: At each valid filter position in the input, thesame parameters (or weights) are used.



Convolutional Layer III

How does this relate to regular (fully connected) networks?1. Local connectivity: Each dot product is computed using only a

local neighborhood of the input (e.g. 5x5 filter)2. Parameter Sharing: At each valid filter position in the input, the

same parameters (or weights) are used.



Fully Connected -> Convolutional

N²




N

N

Vector -> Matrix




N

N

Local Connectivity




N

N

Parameter Sharing




N

N

Convolution



Convolutional Layer IIII

The hyper parameters when creating convolutional layers are• filter size, F• Stride, s• number of filters/feature maps, d (the depth of the output

volumes)• zero padding, p (to control the width and height of the

volumes/feature maps. Set to filter (F − 1)/2 to keep size)



Max Pooling layer

A parameterless layer that subsamples the feature maps in the twospatial dimensions using the max operation. For a single feature map:

4

4

1 6 3 4

2 1 3 4

5 4

58

1

3 3

7

6 4

78

2

2

max-pooling with 2x2 filterand stride 2



What is the network learning?

For a provided filter (neuron, unit), what does the correspondingfeature map (output) look like when the top 9 images that exited thefilter the most are fed through the network?



Evolution of Architectures

Since their inception in the late 80s, the design principles for CNNshave changed a lot. These are referenced a lot in papers.

1. LeNet (90s)2. AlexNet (2012 Imagenet Winner)3. ZFNet (2013 Imagenet Winner)4. VGGNet (2014 Imagenet Runner-up)5. GoogleNet Inception (2104 Imagenet Winner)6. ResNet (2015 Imagenet Winner)




LeNet5 (Gradient-based learning applied to document recognition,LeCun et al, 1998)

Source: Gradient-based learning applied to document recognition, LeCun et al, 1998




AlexNet (Imagenet Classification with Deep Convolutional NeuralNetworks, Krizhevsky et al, 2012)

Source: Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky et al, 2012




ZFNet (Visualizing and Understanding Convolutional Neural Networks,Zeiler & Furgus, 2013)

Source: Visualizing and Understanding Convolutional Neural Networks, Zeiler & Furgus, 2013




VGGNet (Very Deep Convolutional Networks for Large Scale ImageRecognition, Simonyan & Zisserman 2014)

Source: https://www.saagie.com/fr/blog/object-detection-part1




GoogleNet, Inception (Going Deeper with Convolutions, Szegedy et al,2014)

Source: https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html




ResNet (Deep Residual Learning for Image Recognition, He et al,2015)

Source: http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html



Revolution of Depth



Skin cancer – background

One recent result on the use of deep learning in medicine - Detectingskin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-level classificationof skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.

Some background figures (from the US) on skin cancer:• Melanomas represents less than 5% of all skin cancers, but

accounts for 75% of all skin-cancer-related deaths.• Early detection absolutely critical. Estimated 5-year survival rate

for melanoma: Over 99% if detected in its earlier stages and 14%is detected in its later stages.



Skin cancer – task

1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7

LetterreSeArCH

lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.

To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show

two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:

=sensitivitytrue positive

positive

=specificitytrue negative

negative

where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities

Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…

Blue nevusHalo nevusMongolian spot…

Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)

92% malignant melanocytic lesion

8% benign melanocytic lesion

Skin lesion image

ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax

Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes

(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

ba

Epidermal lesions

Ben

ign

Mal

igna

nt

Melanocytic lesions Melanocytic lesions (dermoscopy)

Skin disease

Benign

Melanocytic

Café aulait spot

Solarlentigo

Epidermal

Seborrhoeickeratosis

Milia

Dermal

Cyst

Non-neoplastic

AcneRosacea

Abrasion

Stevens-Johnsonsyndrome

Tuberoussclerosis

Malignant

Epidermal

Basal cellcarcinoma

Squamouscell

carcinoma

Dermal

Merkel cellcarcinoma

Angiosarcoma

T-cell

B-cell

GenodermatosisCongenitaldyskeratosis

Bullouspemphigoid

Cutaneouslymphoma

Melanoma

Psoriasis

Fibroma

Lipoma

In�ammatory

Atypicalnevus

Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign

example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

Image copyright Nature (doi:10.1038/nature21056)



Skin cancer – taxonomy used

Image copyright Nature doi:10.1038/nature21056)



Skin cancer – solution (ultrabrief)

Start from a neural network trained on 1.28 million images (transferlearning).

Make minor modifications to this model, specializing to presentsituation.

Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).

1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7

LetterreSeArCH

lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.

To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show

two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:

=sensitivitytrue positive

positive

=specificitytrue negative

negative

where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities

Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…

Blue nevusHalo nevusMongolian spot…

Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)

92% malignant melanocytic lesion

8% benign melanocytic lesion

Skin lesion image

ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax

Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes

(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

ba

Epidermal lesions

Ben

ign

Mal

igna

nt

Melanocytic lesions Melanocytic lesions (dermoscopy)

Skin disease

Benign

Melanocytic

Café aulait spot

Solarlentigo

Epidermal

Seborrhoeickeratosis

Milia

Dermal

Cyst

Non-neoplastic

AcneRosacea

Abrasion

Stevens-Johnsonsyndrome

Tuberoussclerosis

Malignant

Epidermal

Basal cellcarcinoma

Squamouscell

carcinoma

Dermal

Merkel cellcarcinoma

Angiosarcoma

T-cell

B-cell

GenodermatosisCongenitaldyskeratosis

Bullouspemphigoid

Cutaneouslymphoma

Melanoma

Psoriasis

Fibroma

Lipoma

In�ammatory

Atypicalnevus

Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign

example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).


?

Unseen data

Modelprediction



Skin cancer – indication of the results

sensitivity =true positive

positivespecificity =

true negativenegative

Letter reSeArCH

Extended Data Figure 4 | Extension of Figure 3 with a different dermatological question. a, Identical plots and results as shown in Fig. 3a, except that dermatologists were asked if a lesion appeared to be malignant or benign. This is a somewhat unnatural question to ask, in the clinic, the

only actionable decision is whether or not to biopsy or treat a lesion. The blue curves for the CNN are identical to Fig. 3. b, Figure 3b reprinted for visual comparison to a.


Image copyright Nature (doi:10.1038/nature21056)



Outline

1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network



Problems with sequential data

Varying size of data examples

No direct coupling between one part of the input to one part of theoutput

Impose a casual relationship between the data points in a sequence

Eg.• Speech recognition, Spoken words→ syllables• Machine translation, English→ Korean• Image captioning, Describe an image with a sentence



Recurrent neural networks

Recurrent Neural Network (RNN) is essentially a nonlinear state spacemodel

st = f(st−1, xt)

ht = g(st)

f(·) & g(·) are neural networks

©Christopher Olah



When to use?

• Single input to multiple outputs, ie. Image captioning1• Multiple inputs to single input, ie. Sentiment analysis• Multiple inputs to multiple outputs, ie. Machine translation, One

step prediction

©Andrej Karpathy

1Deep Visual-Semantic Alignments for Generating Image Descriptions31 / 41 [email protected],[email protected],[email protected] Deep Learning


History

• Early variants f(·) & g(·) single layer networks, ie. Elman /Jordan networks around 1990.• Trained with ordinary back propagation• Vanishing/Exploding gradient =⇒ hard to train long term

dependencies

©Christopher Olah



Long short term memory (LSTM)

Long short term memory (LSTM)2is designed to compensate for thevanishing gradient problem

st = gist−1 + (1− gi)sc

Essentially a weighted update

©Christopher Olah

2Long short term memory, 199733 / 41 [email protected],[email protected],[email protected] Deep Learning


The unreasonable effectiveness of RNN

• Blog entry by AndrejKarpathy• One step prediction• Trained on large corpus of

text

©Andrej Karpathy



Ex. Shakespeare

Trained on all works of Shakespeare ∼ 4MB data. Below is agenerated sample

PANDARUS:Alas, I think he shall be come approached and the day Whenlittle srain would be attain’d into being never fed, And who isbut a chain and subjects of his death, I should not sleep.

Second Senator:They are away this miseries, produced upon my soul,Breaking and strongly should be buried, when I perish Theearth and thoughts of many states.



More examples

Linux core source code/** If this error is set, we will need anything right after that BSD.*/static void action_new_function(struct s_stat_info *wb){

unsigned long flags;int lel_idx_bit = e->edd, *sys & ~((unsigned long) *FIRST_COMPAT);buf[0] = 0xFFFFFFFF & (bit << 4);min(inc, slist->bytes);printk(KERN_WARNING "Memory allocated %02x/%02x, ""original MLL instead\n"),min(min(multi_run - s->len, max) * num_data_in),frame_pos , sz + first_seg);div_u64_w(val, inb_p);spin_unlock(&disk->queue_lock);mutex_unlock(&s->sock->mutex);mutex_unlock(&func->mutex);return disassemble(info->pending_bh);

}



More examples

Latex code from math



Under the hood

As with other deep learning models it is hard to understand thefunction of all hidden states in the model

Visualize the activity of a particular state variable during a run



Visualizing the network

Rowlength

Inside quotation

Rawtext in program



Image captioning

Create an initial state with a convolutional neural network

Use the same technique to generate a sentence describing the image

©Andrej Karpathy and Li Fei-Fei



Thank you!



machine learning journal club - uppsala university · deeplearning machine learning journal club...

Documents