machine learning journal club - uppsala university · deeplearning machine learning journal club...

73
Deep Learning Machine Learning Journal Club Carl Andersson Niklas Wahlström Tomas Wilkinsson Department of Information Technology Uppsala University [email protected],[email protected],[email protected] Deep Learning

Upload: others

Post on 20-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Deep LearningMachine Learning Journal Club

Carl AnderssonNiklas WahlströmTomas Wilkinsson

Department of Information TechnologyUppsala University

[email protected],[email protected],[email protected] Deep Learning

Page 2: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Deep Learning: Motivation

Machine learning influences many aspects of modern society

These application make use of a class of techniques called deeplearning

1 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 3: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Two tasks where Deep Learning shines

Task 1 - Image classification Task 2 - Speech recognition

Input: pixels of an imageOutput: object identity

Model structure:Convolutional neural networks

Input: spoken languageOutput: text

Model structure:Recurrent neural networks

2 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 4: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Outline

1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network

3 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 5: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Constructing NN for regression

A neural network (NN) is a nonlinear function Y = fθ(X)from an input X to a output Y parameterized by parameters θ.

Linear regression models the relationship between a continuousoutput Y and a continuous input X ,

Y = β0 +

p∑j=1

Xjβj = βTX + ε,

where β is the parameters composed by the “weights” βj and theoffset (“bias”/“intercept”) term βj ,

β =(β0 β1 β2 · · · βp

)T,

X =(1 X1 X2 · · · Xp

)T.

4 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 6: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Generalized linear regression

We can generalize this by introducing nonlinear transformations of thepredictor βTX ,

Y = σ(βTX) + ε....

1X1

Xp

σ Y

β0

βp

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.

5 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 7: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Generalized linear regression

We can generalize this by introducing nonlinear transformations of thepredictor βTX ,

Y = σ(βTX) + ε....

1X1

Xp

σ Y

β0

βp

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.

5 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 8: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Generalized linear regression

We can generalize this by introducing nonlinear transformations of thepredictor βTX ,

Y = σ(βTX) + ε....

1X1

Xp

σ Y

β0

βp

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.

5 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 9: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Generalized linear regression

We can generalize this by introducing nonlinear transformations of thepredictor βTX ,

Y = σ(βTX) + ε....

1X1

Xp

σ Y

β0

βp

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.

5 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 10: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Generalized linear regression

We can generalize this by introducing nonlinear transformations of thepredictor βTX ,

Y = σ(βTX) + ε....

1X1

Xp

σ Y

β0

βp

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.

5 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 11: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σZ1

Yσ...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z1 = σ(+∑p

j=1Xj

)Z2 = σ

(+∑p

j=1Xj

)...

ZM = σ(+∑p

j=1Xj

)

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 12: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σZ1

Y

σ...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)

Z2 = σ(+∑p

j=1Xj

)...

ZM = σ(+∑p

j=1Xj

)

Y = β(2)1 Z1

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 13: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σZ1

...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)Z2 = σ

(β(1)02 +

∑p

j=1β(1)j2 Xj

)

...ZM = σ

(+∑p

j=1Xj

)

Y =

2∑m=1

β(2)m Zm

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 14: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σZ1

Yσ...σZM

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)Z2 = σ

(β(1)02 +

∑p

j=1β(1)j2 Xj

)...

ZM = σ(β(1)0M +

∑p

j=1β(1)jMXj

)Y =

M∑m=1

β(2)m Zm

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 15: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σZ1

Yσ...σZM

1

1

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z1 = σ(β(1)01 +

∑p

j=1β(1)j1 Xj

)Z2 = σ

(β(1)02 +

∑p

j=1β(1)j2 Xj

)...

ZM = σ(β(1)0M +

∑p

j=1β(1)jMXj

)Y = β

(2)0 +

M∑m=1

β(2)m Zm

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 16: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σ

Yσ...σ

1

1

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z = σ(WT1 X + bT1 )

b1 = [ β(1)01 ... β

(1)0M

]

W1 =

β(1)01 ... β

(1)0M

... ......

β(1)p1 ... β

(1)pM

Y = σ(WT

2 Z + bT2 )

b2 = [ β(1)0 ]

W2 =

β(2)0

...β(2)M

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 17: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σ

Yσ...σ

1

1

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Outputs

Z = σ(WT1 X + bT1 )

Y =WT2 Z + bT2

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 18: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Neural network - construction

A NN is a sequential construction of several linear regressionmodels.

...

1

X1

Xp

σ

σ...σ

Z(1)1

Z(1)2

Z(1)M1

11

...

σ

σ

σ

Z(2)1

Z(2)2

Z(2)M2

Y

Inputs Hidden units Hidden units Outputs

Z(1) = σ(WT1 X + bT1 )

Z(2) = σ(WT2 Z

(1) + bT2 )

Y =WT3 Z

(2) + bT3

The model learns better using adeep network (several layers)instead of a wide and shallownetwork.

6 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 19: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

A 2-layer neural network in matrix notation

Consider N training data points T = {xi, yi}Ni=1. We stack each datapoint i in a row

zT1zT2...zTN

=

σ(xT1W1 + b1)σ(xT2W1 + b1)

...σ(xTNW1 + b1)

yT1yT2...yTN

=

zT1W2 + b2zT2W2 + b2

...zTNW2 + b2

This is how it is written inmatrix form. +b1, +b2 andσ applied on every row.

Z = σ(XW1 + b1)

Y = ZW2 + b2

... and in TensorFlow (popular software package for DL)

Z = tf.nn.sigmoid(tf.matmul(X, W1) + b1)Yhat = tf.nn.matmul(Z, W2) + b2

7 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 20: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Training a neural network

• Formulate a cost function, for exampleJ(θ) =

∑Ni=1 ‖yi − fθ(xi)‖2 or J(θ) = −

∑Ni=1 y

Ti log(fθ(xi))

• Minimize with stochastic gradient decent• Gradients can efficiently be computed using back-propagation

Example: Training a five layer network on the MNIST data set

8 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 21: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Why now?

Neural networks have been around for more than fifty years. Why havethey become so popular now (again)?

To solve really interesting problems you need:1. Efficient learning algorithms2. Efficient computational hardware3. A lot of labeled data!

These three factors have not been fulfilled to a satisfactory level untilthe last 5-10 years.

9 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 22: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Outline

1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network neural network

10 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 23: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Neural Networks

One of the big recent success stories for neural networks is incomputer vision. Since 2012, neural networks have been used tosome extent in all winning contributions in the largest computer visioncompetitions (ImageNet, MSCOCO, ...)

Recently, medical imaging has seen increased interest from theMachine Learning community (and vice versa) [1]. NN have seensuccess for a few years now [2, 3]

1. Deep Learning for Medical Image Analysis, Zhou et al, 2017

2. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images, Ciresan et al, 2012

3. U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al, 2015

11 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 24: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Neural Networks

Neural networks are typically called convolutional (CNNs or ConvNets)when they contain one or more convolutional layers.

They work on volumes of data, e.g., images (H, W, 3), where spatialcorrelations exist in the input, and their intermediate representationsare also volumes of data.

12 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 25: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer I

32x32x3 image

5x5x3 filter75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output

13 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 26: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer I

32x32x3 image

5x5x3 filter

75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output

13 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 27: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer I

32x32x3 image

5x5x3 filter75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output

13 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 29: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer II

4x5x5x3 filters

32x32x3 image28x28x4 "image"

14 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 30: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer II

32x32x3 image28x28x4

Conv4x5x5x3

Conv10x5x5x4

24x24x10

Relu Relu

...

14 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 31: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer III

How does this relate to regular (fully connected) networks?

1. Local connectivity: Each dot product is computed using only alocal neighborhood of the input (e.g. 5x5 filter)

2. Parameter Sharing: At each valid filter position in the input, thesame parameters (or weights) are used.

15 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 32: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer III

How does this relate to regular (fully connected) networks?1. Local connectivity: Each dot product is computed using only a

local neighborhood of the input (e.g. 5x5 filter)2. Parameter Sharing: At each valid filter position in the input, the

same parameters (or weights) are used.

15 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 34: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Fully Connected -> Convolutional

N

N

Vector -> Matrix

16 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 35: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Fully Connected -> Convolutional

N

N

Local Connectivity

16 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 36: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Fully Connected -> Convolutional

N

N

Parameter Sharing

16 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 37: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Fully Connected -> Convolutional

N

N

Convolution

16 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 38: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Fully Connected -> Convolutional

N

N

Convolution

16 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 39: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Fully Connected -> Convolutional

N

N

Convolution

16 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 40: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Convolutional Layer IIII

The hyper parameters when creating convolutional layers are• filter size, F• Stride, s• number of filters/feature maps, d (the depth of the output

volumes)• zero padding, p (to control the width and height of the

volumes/feature maps. Set to filter (F − 1)/2 to keep size)

17 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 41: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Max Pooling layer

A parameterless layer that subsamples the feature maps in the twospatial dimensions using the max operation. For a single feature map:

4

4

1 6 3 4

2 1 3 4

5 4

58

1

3 3

7

6 4

78

2

2

max-pooling with 2x2 filterand stride 2

18 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 42: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

What is the network learning?

For a provided filter (neuron, unit), what does the correspondingfeature map (output) look like when the top 9 images that exited thefilter the most are fed through the network?

19 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 43: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

Since their inception in the late 80s, the design principles for CNNshave changed a lot. These are referenced a lot in papers.

1. LeNet (90s)2. AlexNet (2012 Imagenet Winner)3. ZFNet (2013 Imagenet Winner)4. VGGNet (2014 Imagenet Runner-up)5. GoogleNet Inception (2104 Imagenet Winner)6. ResNet (2015 Imagenet Winner)

20 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 44: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

LeNet5 (Gradient-based learning applied to document recognition,LeCun et al, 1998)

Source: Gradient-based learning applied to document recognition, LeCun et al, 1998

21 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 45: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

AlexNet (Imagenet Classification with Deep Convolutional NeuralNetworks, Krizhevsky et al, 2012)

Source: Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky et al, 2012

21 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 46: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

ZFNet (Visualizing and Understanding Convolutional Neural Networks,Zeiler & Furgus, 2013)

Source: Visualizing and Understanding Convolutional Neural Networks, Zeiler & Furgus, 2013

21 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 47: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

VGGNet (Very Deep Convolutional Networks for Large Scale ImageRecognition, Simonyan & Zisserman 2014)

Source: https://www.saagie.com/fr/blog/object-detection-part1

21 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 48: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

GoogleNet, Inception (Going Deeper with Convolutions, Szegedy et al,2014)

Source: https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

21 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 49: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Evolution of Architectures

ResNet (Deep Residual Learning for Image Recognition, He et al,2015)

Source: http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html

21 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 51: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – background

One recent result on the use of deep learning in medicine - Detectingskin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-level classificationof skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.

Some background figures (from the US) on skin cancer:• Melanomas represents less than 5% of all skin cancers, but

accounts for 75% of all skin-cancer-related deaths.• Early detection absolutely critical. Estimated 5-year survival rate

for melanoma: Over 99% if detected in its earlier stages and 14%is detected in its later stages.

23 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 52: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – background

One recent result on the use of deep learning in medicine - Detectingskin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-level classificationof skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.

Some background figures (from the US) on skin cancer:• Melanomas represents less than 5% of all skin cancers, but

accounts for 75% of all skin-cancer-related deaths.• Early detection absolutely critical. Estimated 5-year survival rate

for melanoma: Over 99% if detected in its earlier stages and 14%is detected in its later stages.

23 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 53: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – task

1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7

LetterreSeArCH

lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.

To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show

two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:

=sensitivitytrue positive

positive

=specificitytrue negative

negative

where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities

Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…

Blue nevusHalo nevusMongolian spot…

Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)

92% malignant melanocytic lesion

8% benign melanocytic lesion

Skin lesion image

ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax

Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes

(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

ba

Epidermal lesions

Ben

ign

Mal

igna

nt

Melanocytic lesions Melanocytic lesions (dermoscopy)

Skin disease

Benign

Melanocytic

Café aulait spot

Solarlentigo

Epidermal

Seborrhoeickeratosis

Milia

Dermal

Cyst

Non-neoplastic

AcneRosacea

Abrasion

Stevens-Johnsonsyndrome

Tuberoussclerosis

Malignant

Epidermal

Basal cellcarcinoma

Squamouscell

carcinoma

Dermal

Merkel cellcarcinoma

Angiosarcoma

T-cell

B-cell

GenodermatosisCongenitaldyskeratosis

Bullouspemphigoid

Cutaneouslymphoma

Melanoma

Psoriasis

Fibroma

Lipoma

In�ammatory

Atypicalnevus

Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign

example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

Image copyright Nature (doi:10.1038/nature21056)

24 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 54: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – taxonomy used

Image copyright Nature doi:10.1038/nature21056)

25 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 55: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – solution (ultrabrief)

Start from a neural network trained on 1.28 million images (transferlearning).

Make minor modifications to this model, specializing to presentsituation.

Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).

1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7

LetterreSeArCH

lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.

To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show

two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:

=sensitivitytrue positive

positive

=specificitytrue negative

negative

where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities

Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…

Blue nevusHalo nevusMongolian spot…

Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)

92% malignant melanocytic lesion

8% benign melanocytic lesion

Skin lesion image

ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax

Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes

(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

ba

Epidermal lesions

Ben

ign

Mal

igna

nt

Melanocytic lesions Melanocytic lesions (dermoscopy)

Skin disease

Benign

Melanocytic

Café aulait spot

Solarlentigo

Epidermal

Seborrhoeickeratosis

Milia

Dermal

Cyst

Non-neoplastic

AcneRosacea

Abrasion

Stevens-Johnsonsyndrome

Tuberoussclerosis

Malignant

Epidermal

Basal cellcarcinoma

Squamouscell

carcinoma

Dermal

Merkel cellcarcinoma

Angiosarcoma

T-cell

B-cell

GenodermatosisCongenitaldyskeratosis

Bullouspemphigoid

Cutaneouslymphoma

Melanoma

Psoriasis

Fibroma

Lipoma

In�ammatory

Atypicalnevus

Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign

example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

?

Unseen data

Modelprediction

26 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 56: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – solution (ultrabrief)

Start from a neural network trained on 1.28 million images (transferlearning).

Make minor modifications to this model, specializing to presentsituation.

Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).

1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7

LetterreSeArCH

lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.

To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show

two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:

=sensitivitytrue positive

positive

=specificitytrue negative

negative

where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities

Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…

Blue nevusHalo nevusMongolian spot…

Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)

92% malignant melanocytic lesion

8% benign melanocytic lesion

Skin lesion image

ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax

Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes

(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

ba

Epidermal lesions

Ben

ign

Mal

igna

nt

Melanocytic lesions Melanocytic lesions (dermoscopy)

Skin disease

Benign

Melanocytic

Café aulait spot

Solarlentigo

Epidermal

Seborrhoeickeratosis

Milia

Dermal

Cyst

Non-neoplastic

AcneRosacea

Abrasion

Stevens-Johnsonsyndrome

Tuberoussclerosis

Malignant

Epidermal

Basal cellcarcinoma

Squamouscell

carcinoma

Dermal

Merkel cellcarcinoma

Angiosarcoma

T-cell

B-cell

GenodermatosisCongenitaldyskeratosis

Bullouspemphigoid

Cutaneouslymphoma

Melanoma

Psoriasis

Fibroma

Lipoma

In�ammatory

Atypicalnevus

Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign

example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

?

Unseen data

Modelprediction

26 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 57: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – solution (ultrabrief)

Start from a neural network trained on 1.28 million images (transferlearning).

Make minor modifications to this model, specializing to presentsituation.

Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).

1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7

LetterreSeArCH

lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.

To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show

two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:

=sensitivitytrue positive

positive

=specificitytrue negative

negative

where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities

Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…

Blue nevusHalo nevusMongolian spot…

Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)

92% malignant melanocytic lesion

8% benign melanocytic lesion

Skin lesion image

ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax

Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes

(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

ba

Epidermal lesions

Ben

ign

Mal

igna

nt

Melanocytic lesions Melanocytic lesions (dermoscopy)

Skin disease

Benign

Melanocytic

Café aulait spot

Solarlentigo

Epidermal

Seborrhoeickeratosis

Milia

Dermal

Cyst

Non-neoplastic

AcneRosacea

Abrasion

Stevens-Johnsonsyndrome

Tuberoussclerosis

Malignant

Epidermal

Basal cellcarcinoma

Squamouscell

carcinoma

Dermal

Merkel cellcarcinoma

Angiosarcoma

T-cell

B-cell

GenodermatosisCongenitaldyskeratosis

Bullouspemphigoid

Cutaneouslymphoma

Melanoma

Psoriasis

Fibroma

Lipoma

In�ammatory

Atypicalnevus

Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign

example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

?

Unseen data

Modelprediction

26 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 58: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – indication of the results

sensitivity =true positive

positivespecificity =

true negativenegative

Letter reSeArCH

Extended Data Figure 4 | Extension of Figure 3 with a different dermatological question. a, Identical plots and results as shown in Fig. 3a, except that dermatologists were asked if a lesion appeared to be malignant or benign. This is a somewhat unnatural question to ask, in the clinic, the

only actionable decision is whether or not to biopsy or treat a lesion. The blue curves for the CNN are identical to Fig. 3. b, Figure 3b reprinted for visual comparison to a.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

Image copyright Nature (doi:10.1038/nature21056)

27 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 59: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Skin cancer – indication of the results

sensitivity =true positive

positivespecificity =

true negativenegative

Letter reSeArCH

Extended Data Figure 4 | Extension of Figure 3 with a different dermatological question. a, Identical plots and results as shown in Fig. 3a, except that dermatologists were asked if a lesion appeared to be malignant or benign. This is a somewhat unnatural question to ask, in the clinic, the

only actionable decision is whether or not to biopsy or treat a lesion. The blue curves for the CNN are identical to Fig. 3. b, Figure 3b reprinted for visual comparison to a.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

Image copyright Nature (doi:10.1038/nature21056)

27 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 60: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Outline

1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network

28 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 61: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Problems with sequential data

Varying size of data examples

No direct coupling between one part of the input to one part of theoutput

Impose a casual relationship between the data points in a sequence

Eg.• Speech recognition, Spoken words→ syllables• Machine translation, English→ Korean• Image captioning, Describe an image with a sentence

29 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 62: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Recurrent neural networks

Recurrent Neural Network (RNN) is essentially a nonlinear state spacemodel

st = f(st−1, xt)

ht = g(st)

f(·) & g(·) are neural networks

©Christopher Olah

30 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 63: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

When to use?

• Single input to multiple outputs, ie. Image captioning1• Multiple inputs to single input, ie. Sentiment analysis• Multiple inputs to multiple outputs, ie. Machine translation, One

step prediction

©Andrej Karpathy

1Deep Visual-Semantic Alignments for Generating Image Descriptions31 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 64: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

History

• Early variants f(·) & g(·) single layer networks, ie. Elman /Jordan networks around 1990.• Trained with ordinary back propagation• Vanishing/Exploding gradient =⇒ hard to train long term

dependencies

©Christopher Olah

32 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 65: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Long short term memory (LSTM)

Long short term memory (LSTM)2is designed to compensate for thevanishing gradient problem

st = gist−1 + (1− gi)sc

Essentially a weighted update

©Christopher Olah

2Long short term memory, 199733 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 66: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

The unreasonable effectiveness of RNN

• Blog entry by AndrejKarpathy• One step prediction• Trained on large corpus of

text

©Andrej Karpathy

34 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 67: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Ex. Shakespeare

Trained on all works of Shakespeare ∼ 4MB data. Below is agenerated sample

PANDARUS:Alas, I think he shall be come approached and the day Whenlittle srain would be attain’d into being never fed, And who isbut a chain and subjects of his death, I should not sleep.

Second Senator:They are away this miseries, produced upon my soul,Breaking and strongly should be buried, when I perish Theearth and thoughts of many states.

35 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 68: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

More examples

Linux core source code/** If this error is set, we will need anything right after that BSD.*/static void action_new_function(struct s_stat_info *wb){

unsigned long flags;int lel_idx_bit = e->edd, *sys & ~((unsigned long) *FIRST_COMPAT);buf[0] = 0xFFFFFFFF & (bit << 4);min(inc, slist->bytes);printk(KERN_WARNING "Memory allocated %02x/%02x, ""original MLL instead\n"),min(min(multi_run - s->len, max) * num_data_in),frame_pos , sz + first_seg);div_u64_w(val, inb_p);spin_unlock(&disk->queue_lock);mutex_unlock(&s->sock->mutex);mutex_unlock(&func->mutex);return disassemble(info->pending_bh);

}

36 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 70: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Under the hood

As with other deep learning models it is hard to understand thefunction of all hidden states in the model

Visualize the activity of a particular state variable during a run

38 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 71: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Visualizing the network

Rowlength

Inside quotation

Rawtext in program

39 / 41 [email protected],[email protected],[email protected] Deep Learning

Page 72: Machine Learning Journal Club - Uppsala University · DeepLearning Machine Learning Journal Club CarlAndersson NiklasWahlström TomasWilkinsson DepartmentofInformationTechnology UppsalaUniversity

Image captioning

Create an initial state with a convolutional neural network

Use the same technique to generate a sentence describing the image

©Andrej Karpathy and Li Fei-Fei

40 / 41 [email protected],[email protected],[email protected] Deep Learning