machine learning journal club - uppsala university · deeplearning machine learning journal club...
TRANSCRIPT
Deep LearningMachine Learning Journal Club
Carl AnderssonNiklas WahlströmTomas Wilkinsson
Department of Information TechnologyUppsala University
[email protected],[email protected],[email protected] Deep Learning
Deep Learning: Motivation
Machine learning influences many aspects of modern society
These application make use of a class of techniques called deeplearning
1 / 41 [email protected],[email protected],[email protected] Deep Learning
Two tasks where Deep Learning shines
Task 1 - Image classification Task 2 - Speech recognition
Input: pixels of an imageOutput: object identity
Model structure:Convolutional neural networks
Input: spoken languageOutput: text
Model structure:Recurrent neural networks
2 / 41 [email protected],[email protected],[email protected] Deep Learning
Outline
1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network
3 / 41 [email protected],[email protected],[email protected] Deep Learning
Constructing NN for regression
A neural network (NN) is a nonlinear function Y = fθ(X)from an input X to a output Y parameterized by parameters θ.
Linear regression models the relationship between a continuousoutput Y and a continuous input X ,
Y = β0 +
p∑j=1
Xjβj = βTX + ε,
where β is the parameters composed by the “weights” βj and theoffset (“bias”/“intercept”) term βj ,
β =(β0 β1 β2 · · · βp
)T,
X =(1 X1 X2 · · · Xp
)T.
4 / 41 [email protected],[email protected],[email protected] Deep Learning
Generalized linear regression
We can generalize this by introducing nonlinear transformations of thepredictor βTX ,
Y = σ(βTX) + ε....
1X1
Xp
σ Y
β0
βp
We call σ(x) the activation function. Two common choices are:
−5 5
1
x
σ(x)
Sigmoid: σ(x) = 11+e−x
−1 1
1
x
σ(x)
ReLU: σ(x) = max(0, x)
Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.
5 / 41 [email protected],[email protected],[email protected] Deep Learning
Generalized linear regression
We can generalize this by introducing nonlinear transformations of thepredictor βTX ,
Y = σ(βTX) + ε....
1X1
Xp
σ Y
β0
βp
We call σ(x) the activation function. Two common choices are:
−5 5
1
x
σ(x)
Sigmoid: σ(x) = 11+e−x
−1 1
1
x
σ(x)
ReLU: σ(x) = max(0, x)
Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.
5 / 41 [email protected],[email protected],[email protected] Deep Learning
Generalized linear regression
We can generalize this by introducing nonlinear transformations of thepredictor βTX ,
Y = σ(βTX) + ε....
1X1
Xp
σ Y
β0
βp
We call σ(x) the activation function. Two common choices are:
−5 5
1
x
σ(x)
Sigmoid: σ(x) = 11+e−x
−1 1
1
x
σ(x)
ReLU: σ(x) = max(0, x)
Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.
5 / 41 [email protected],[email protected],[email protected] Deep Learning
Generalized linear regression
We can generalize this by introducing nonlinear transformations of thepredictor βTX ,
Y = σ(βTX) + ε....
1X1
Xp
σ Y
β0
βp
We call σ(x) the activation function. Two common choices are:
−5 5
1
x
σ(x)
Sigmoid: σ(x) = 11+e−x
−1 1
1
x
σ(x)
ReLU: σ(x) = max(0, x)
Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.
5 / 41 [email protected],[email protected],[email protected] Deep Learning
Generalized linear regression
We can generalize this by introducing nonlinear transformations of thepredictor βTX ,
Y = σ(βTX) + ε....
1X1
Xp
σ Y
β0
βp
We call σ(x) the activation function. Two common choices are:
−5 5
1
x
σ(x)
Sigmoid: σ(x) = 11+e−x
−1 1
1
x
σ(x)
ReLU: σ(x) = max(0, x)
Let us consider an example of a feed-forward NN, indicating that theinformation flows from the input to the output layer.
5 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σZ1
Yσ...σZM
11
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z1 = σ(+∑p
j=1Xj
)Z2 = σ
(+∑p
j=1Xj
)...
ZM = σ(+∑p
j=1Xj
)
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σZ1
Y
σ...σZM
11
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z1 = σ(β(1)01 +
∑p
j=1β(1)j1 Xj
)
Z2 = σ(+∑p
j=1Xj
)...
ZM = σ(+∑p
j=1Xj
)
Y = β(2)1 Z1
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σZ1
Yσ
...σZM
11
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z1 = σ(β(1)01 +
∑p
j=1β(1)j1 Xj
)Z2 = σ
(β(1)02 +
∑p
j=1β(1)j2 Xj
)
...ZM = σ
(+∑p
j=1Xj
)
Y =
2∑m=1
β(2)m Zm
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σZ1
Yσ...σZM
11
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z1 = σ(β(1)01 +
∑p
j=1β(1)j1 Xj
)Z2 = σ
(β(1)02 +
∑p
j=1β(1)j2 Xj
)...
ZM = σ(β(1)0M +
∑p
j=1β(1)jMXj
)Y =
M∑m=1
β(2)m Zm
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σZ1
Yσ...σZM
1
1
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z1 = σ(β(1)01 +
∑p
j=1β(1)j1 Xj
)Z2 = σ
(β(1)02 +
∑p
j=1β(1)j2 Xj
)...
ZM = σ(β(1)0M +
∑p
j=1β(1)jMXj
)Y = β
(2)0 +
M∑m=1
β(2)m Zm
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σ
Yσ...σ
1
1
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z = σ(WT1 X + bT1 )
b1 = [ β(1)01 ... β
(1)0M
]
W1 =
β(1)01 ... β
(1)0M
... ......
β(1)p1 ... β
(1)pM
Y = σ(WT
2 Z + bT2 )
b2 = [ β(1)0 ]
W2 =
β(2)0
...β(2)M
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σ
Yσ...σ
1
1
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Outputs
Z = σ(WT1 X + bT1 )
Y =WT2 Z + bT2
6 / 41 [email protected],[email protected],[email protected] Deep Learning
Neural network - construction
A NN is a sequential construction of several linear regressionmodels.
...
1
X1
Xp
σ
σ...σ
Z(1)1
Z(1)2
Z(1)M1
11
...
σ
σ
σ
Z(2)1
Z(2)2
Z(2)M2
Y
Inputs Hidden units Hidden units Outputs
Z(1) = σ(WT1 X + bT1 )
Z(2) = σ(WT2 Z
(1) + bT2 )
Y =WT3 Z
(2) + bT3
The model learns better using adeep network (several layers)instead of a wide and shallownetwork.
6 / 41 [email protected],[email protected],[email protected] Deep Learning
A 2-layer neural network in matrix notation
Consider N training data points T = {xi, yi}Ni=1. We stack each datapoint i in a row
zT1zT2...zTN
=
σ(xT1W1 + b1)σ(xT2W1 + b1)
...σ(xTNW1 + b1)
yT1yT2...yTN
=
zT1W2 + b2zT2W2 + b2
...zTNW2 + b2
This is how it is written inmatrix form. +b1, +b2 andσ applied on every row.
Z = σ(XW1 + b1)
Y = ZW2 + b2
... and in TensorFlow (popular software package for DL)
Z = tf.nn.sigmoid(tf.matmul(X, W1) + b1)Yhat = tf.nn.matmul(Z, W2) + b2
7 / 41 [email protected],[email protected],[email protected] Deep Learning
Training a neural network
• Formulate a cost function, for exampleJ(θ) =
∑Ni=1 ‖yi − fθ(xi)‖2 or J(θ) = −
∑Ni=1 y
Ti log(fθ(xi))
• Minimize with stochastic gradient decent• Gradients can efficiently be computed using back-propagation
Example: Training a five layer network on the MNIST data set
8 / 41 [email protected],[email protected],[email protected] Deep Learning
Why now?
Neural networks have been around for more than fifty years. Why havethey become so popular now (again)?
To solve really interesting problems you need:1. Efficient learning algorithms2. Efficient computational hardware3. A lot of labeled data!
These three factors have not been fulfilled to a satisfactory level untilthe last 5-10 years.
9 / 41 [email protected],[email protected],[email protected] Deep Learning
Outline
1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network neural network
10 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Neural Networks
One of the big recent success stories for neural networks is incomputer vision. Since 2012, neural networks have been used tosome extent in all winning contributions in the largest computer visioncompetitions (ImageNet, MSCOCO, ...)
Recently, medical imaging has seen increased interest from theMachine Learning community (and vice versa) [1]. NN have seensuccess for a few years now [2, 3]
1. Deep Learning for Medical Image Analysis, Zhou et al, 2017
2. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images, Ciresan et al, 2012
3. U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al, 2015
11 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Neural Networks
Neural networks are typically called convolutional (CNNs or ConvNets)when they contain one or more convolutional layers.
They work on volumes of data, e.g., images (H, W, 3), where spatialcorrelations exist in the input, and their intermediate representationsare also volumes of data.
12 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer I
32x32x3 image
5x5x3 filter75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output
13 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer I
32x32x3 image
5x5x3 filter
75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output
13 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer I
32x32x3 image
5x5x3 filter75 (+1 for bias)dimensional dotproducts wTx+ b arecomputed at each validlocation in input toproduce output
13 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer II
32x32x3 image 28x28x1
14 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer II
4x5x5x3 filters
32x32x3 image28x28x4 "image"
14 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer II
32x32x3 image28x28x4
Conv4x5x5x3
Conv10x5x5x4
24x24x10
Relu Relu
...
14 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer III
How does this relate to regular (fully connected) networks?
1. Local connectivity: Each dot product is computed using only alocal neighborhood of the input (e.g. 5x5 filter)
2. Parameter Sharing: At each valid filter position in the input, thesame parameters (or weights) are used.
15 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer III
How does this relate to regular (fully connected) networks?1. Local connectivity: Each dot product is computed using only a
local neighborhood of the input (e.g. 5x5 filter)2. Parameter Sharing: At each valid filter position in the input, the
same parameters (or weights) are used.
15 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N²
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N
N
Vector -> Matrix
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N
N
Local Connectivity
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N
N
Parameter Sharing
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N
N
Convolution
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N
N
Convolution
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Fully Connected -> Convolutional
N
N
Convolution
16 / 41 [email protected],[email protected],[email protected] Deep Learning
Convolutional Layer IIII
The hyper parameters when creating convolutional layers are• filter size, F• Stride, s• number of filters/feature maps, d (the depth of the output
volumes)• zero padding, p (to control the width and height of the
volumes/feature maps. Set to filter (F − 1)/2 to keep size)
17 / 41 [email protected],[email protected],[email protected] Deep Learning
Max Pooling layer
A parameterless layer that subsamples the feature maps in the twospatial dimensions using the max operation. For a single feature map:
4
4
1 6 3 4
2 1 3 4
5 4
58
1
3 3
7
6 4
78
2
2
max-pooling with 2x2 filterand stride 2
18 / 41 [email protected],[email protected],[email protected] Deep Learning
What is the network learning?
For a provided filter (neuron, unit), what does the correspondingfeature map (output) look like when the top 9 images that exited thefilter the most are fed through the network?
19 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
Since their inception in the late 80s, the design principles for CNNshave changed a lot. These are referenced a lot in papers.
1. LeNet (90s)2. AlexNet (2012 Imagenet Winner)3. ZFNet (2013 Imagenet Winner)4. VGGNet (2014 Imagenet Runner-up)5. GoogleNet Inception (2104 Imagenet Winner)6. ResNet (2015 Imagenet Winner)
20 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
LeNet5 (Gradient-based learning applied to document recognition,LeCun et al, 1998)
Source: Gradient-based learning applied to document recognition, LeCun et al, 1998
21 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
AlexNet (Imagenet Classification with Deep Convolutional NeuralNetworks, Krizhevsky et al, 2012)
Source: Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky et al, 2012
21 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
ZFNet (Visualizing and Understanding Convolutional Neural Networks,Zeiler & Furgus, 2013)
Source: Visualizing and Understanding Convolutional Neural Networks, Zeiler & Furgus, 2013
21 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
VGGNet (Very Deep Convolutional Networks for Large Scale ImageRecognition, Simonyan & Zisserman 2014)
Source: https://www.saagie.com/fr/blog/object-detection-part1
21 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
GoogleNet, Inception (Going Deeper with Convolutions, Szegedy et al,2014)
Source: https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
21 / 41 [email protected],[email protected],[email protected] Deep Learning
Evolution of Architectures
ResNet (Deep Residual Learning for Image Recognition, He et al,2015)
Source: http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html
21 / 41 [email protected],[email protected],[email protected] Deep Learning
Revolution of Depth
22 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – background
One recent result on the use of deep learning in medicine - Detectingskin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-level classificationof skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.
Some background figures (from the US) on skin cancer:• Melanomas represents less than 5% of all skin cancers, but
accounts for 75% of all skin-cancer-related deaths.• Early detection absolutely critical. Estimated 5-year survival rate
for melanoma: Over 99% if detected in its earlier stages and 14%is detected in its later stages.
23 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – background
One recent result on the use of deep learning in medicine - Detectingskin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-level classificationof skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.
Some background figures (from the US) on skin cancer:• Melanomas represents less than 5% of all skin cancers, but
accounts for 75% of all skin-cancer-related deaths.• Early detection absolutely critical. Estimated 5-year survival rate
for melanoma: Over 99% if detected in its earlier stages and 14%is detected in its later stages.
23 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – task
1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7
LetterreSeArCH
lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.
To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show
two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:
=sensitivitytrue positive
positive
=specificitytrue negative
negative
where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities
Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…
Blue nevusHalo nevusMongolian spot…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax
Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
ba
Epidermal lesions
Ben
ign
Mal
igna
nt
Melanocytic lesions Melanocytic lesions (dermoscopy)
Skin disease
Benign
Melanocytic
Café aulait spot
Solarlentigo
Epidermal
Seborrhoeickeratosis
Milia
Dermal
Cyst
Non-neoplastic
AcneRosacea
Abrasion
Stevens-Johnsonsyndrome
Tuberoussclerosis
Malignant
Epidermal
Basal cellcarcinoma
Squamouscell
carcinoma
Dermal
Merkel cellcarcinoma
Angiosarcoma
T-cell
B-cell
GenodermatosisCongenitaldyskeratosis
Bullouspemphigoid
Cutaneouslymphoma
Melanoma
Psoriasis
Fibroma
Lipoma
In�ammatory
Atypicalnevus
Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign
example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Image copyright Nature (doi:10.1038/nature21056)
24 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – taxonomy used
Image copyright Nature doi:10.1038/nature21056)
25 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – solution (ultrabrief)
Start from a neural network trained on 1.28 million images (transferlearning).
Make minor modifications to this model, specializing to presentsituation.
Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).
1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7
LetterreSeArCH
lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.
To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show
two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:
=sensitivitytrue positive
positive
=specificitytrue negative
negative
where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities
Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…
Blue nevusHalo nevusMongolian spot…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax
Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
ba
Epidermal lesions
Ben
ign
Mal
igna
nt
Melanocytic lesions Melanocytic lesions (dermoscopy)
Skin disease
Benign
Melanocytic
Café aulait spot
Solarlentigo
Epidermal
Seborrhoeickeratosis
Milia
Dermal
Cyst
Non-neoplastic
AcneRosacea
Abrasion
Stevens-Johnsonsyndrome
Tuberoussclerosis
Malignant
Epidermal
Basal cellcarcinoma
Squamouscell
carcinoma
Dermal
Merkel cellcarcinoma
Angiosarcoma
T-cell
B-cell
GenodermatosisCongenitaldyskeratosis
Bullouspemphigoid
Cutaneouslymphoma
Melanoma
Psoriasis
Fibroma
Lipoma
In�ammatory
Atypicalnevus
Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign
example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
?
Unseen data
Modelprediction
26 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – solution (ultrabrief)
Start from a neural network trained on 1.28 million images (transferlearning).
Make minor modifications to this model, specializing to presentsituation.
Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).
1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7
LetterreSeArCH
lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.
To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show
two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:
=sensitivitytrue positive
positive
=specificitytrue negative
negative
where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities
Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…
Blue nevusHalo nevusMongolian spot…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax
Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
ba
Epidermal lesions
Ben
ign
Mal
igna
nt
Melanocytic lesions Melanocytic lesions (dermoscopy)
Skin disease
Benign
Melanocytic
Café aulait spot
Solarlentigo
Epidermal
Seborrhoeickeratosis
Milia
Dermal
Cyst
Non-neoplastic
AcneRosacea
Abrasion
Stevens-Johnsonsyndrome
Tuberoussclerosis
Malignant
Epidermal
Basal cellcarcinoma
Squamouscell
carcinoma
Dermal
Merkel cellcarcinoma
Angiosarcoma
T-cell
B-cell
GenodermatosisCongenitaldyskeratosis
Bullouspemphigoid
Cutaneouslymphoma
Melanoma
Psoriasis
Fibroma
Lipoma
In�ammatory
Atypicalnevus
Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign
example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
?
Unseen data
Modelprediction
26 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – solution (ultrabrief)
Start from a neural network trained on 1.28 million images (transferlearning).
Make minor modifications to this model, specializing to presentsituation.
Learn new model parameters using129 450 clinical images (∼ 100times more images than anyprevious study).
1 1 6 | N a T u r e | V O L 5 4 2 | 2 F e b r u a r y 2 0 1 7
LetterreSeArCH
lesions. In this task, the CNN achieves 72.1 ± 0.9% (mean ± s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56% and 66.0% accuracy on a subset of the validation set. Second, we validate the algorithm using a nine-class disease partition—the second-level nodes—so that the diseases of each class have similar medical treatment plans. The CNN achieves 55.4 ± 1.7% overall accuracy whereas the same two dermatologists attain 53.3% and 55.0% accuracy. A CNN trained on a finer disease partition performs better than one trained directly on three or nine classes (see Extended Data Table 2), demonstrating the effectiveness of our partitioning algorithm. Because images of the validation set are labelled by dermatologists, but not necessarily confirmed by biopsy, this metric is inconclusive, and instead shows that the CNN is learning relevant information.
To conclusively validate the algorithm, we tested, using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. For melanocytic lesions, we show
two trials, one using standard images and the other using dermoscopy images, which reflect the two steps that a dermatologist might carry out to obtain a clinical impression. The same CNN is used for all three tasks. Figure 2b shows a few example images, demonstrating the difficulty in distinguishing between malignant and benign lesions, which share many visual features. Our comparison metrics are sensitivity and specificity:
=sensitivitytrue positive
positive
=specificitytrue negative
negative
where ‘true positive’ is the number of correctly predicted malignant lesions, ‘positive’ is the number of malignant lesions shown, ‘true neg-ative’ is the number of correctly predicted benign lesions, and ‘neg-ative’ is the number of benign lesions shown. When a test set is fed through the CNN, it outputs a probability, P, of malignancy, per image. We can compute the sensitivity and specificity of these probabilities
Acral-lentiginous melanomaAmelanotic melanomaLentigo melanoma…
Blue nevusHalo nevusMongolian spot…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
ConvolutionAvgPoolMaxPoolConcatDropoutFully connectedSoftmax
Figure 1 | Deep CNN layout. Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on our own dataset of 129,450 skin lesions comprising 2,032 different diseases. The 757 training classes are defined using a novel taxonomy of skin disease and a partitioning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melanoma, lentigo melanoma). Inference classes are more general and are composed of one or more training classes (for example, malignant melanocytic lesions—the class of melanomas). The probability of an inference class is calculated by summing the probabilities of the training classes according to taxonomy structure (see Methods). Inception v3 CNN architecture reprinted from https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
ba
Epidermal lesions
Ben
ign
Mal
igna
nt
Melanocytic lesions Melanocytic lesions (dermoscopy)
Skin disease
Benign
Melanocytic
Café aulait spot
Solarlentigo
Epidermal
Seborrhoeickeratosis
Milia
Dermal
Cyst
Non-neoplastic
AcneRosacea
Abrasion
Stevens-Johnsonsyndrome
Tuberoussclerosis
Malignant
Epidermal
Basal cellcarcinoma
Squamouscell
carcinoma
Dermal
Merkel cellcarcinoma
Angiosarcoma
T-cell
B-cell
GenodermatosisCongenitaldyskeratosis
Bullouspemphigoid
Cutaneouslymphoma
Melanoma
Psoriasis
Fibroma
Lipoma
In�ammatory
Atypicalnevus
Figure 2 | A schematic illustration of the taxonomy and example test set images. a, A subset of the top of the tree-structured taxonomy of skin disease. The full taxonomy contains 2,032 diseases and is organized based on visual and clinical similarity of diseases. Red indicates malignant, green indicates benign, and orange indicates conditions that can be either. Black indicates melanoma. The first two levels of the taxonomy are used in validation. Testing is restricted to the tasks of b. b, Malignant and benign
example images from two disease classes. These test images highlight the difficulty of malignant versus benign discernment for the three medically critical classification tasks we consider: epidermal lesions, melanocytic lesions and melanocytic lesions visualized with a dermoscope. Example images reprinted with permission from the Edinburgh Dermofit Library (https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
?
Unseen data
Modelprediction
26 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – indication of the results
sensitivity =true positive
positivespecificity =
true negativenegative
Letter reSeArCH
Extended Data Figure 4 | Extension of Figure 3 with a different dermatological question. a, Identical plots and results as shown in Fig. 3a, except that dermatologists were asked if a lesion appeared to be malignant or benign. This is a somewhat unnatural question to ask, in the clinic, the
only actionable decision is whether or not to biopsy or treat a lesion. The blue curves for the CNN are identical to Fig. 3. b, Figure 3b reprinted for visual comparison to a.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Image copyright Nature (doi:10.1038/nature21056)
27 / 41 [email protected],[email protected],[email protected] Deep Learning
Skin cancer – indication of the results
sensitivity =true positive
positivespecificity =
true negativenegative
Letter reSeArCH
Extended Data Figure 4 | Extension of Figure 3 with a different dermatological question. a, Identical plots and results as shown in Fig. 3a, except that dermatologists were asked if a lesion appeared to be malignant or benign. This is a somewhat unnatural question to ask, in the clinic, the
only actionable decision is whether or not to biopsy or treat a lesion. The blue curves for the CNN are identical to Fig. 3. b, Figure 3b reprinted for visual comparison to a.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Image copyright Nature (doi:10.1038/nature21056)
27 / 41 [email protected],[email protected],[email protected] Deep Learning
Outline
1. Motivation2. What is a neural network?3. Convolutional neural network4. Recurrent neural network
28 / 41 [email protected],[email protected],[email protected] Deep Learning
Problems with sequential data
Varying size of data examples
No direct coupling between one part of the input to one part of theoutput
Impose a casual relationship between the data points in a sequence
Eg.• Speech recognition, Spoken words→ syllables• Machine translation, English→ Korean• Image captioning, Describe an image with a sentence
29 / 41 [email protected],[email protected],[email protected] Deep Learning
Recurrent neural networks
Recurrent Neural Network (RNN) is essentially a nonlinear state spacemodel
st = f(st−1, xt)
ht = g(st)
f(·) & g(·) are neural networks
©Christopher Olah
30 / 41 [email protected],[email protected],[email protected] Deep Learning
When to use?
• Single input to multiple outputs, ie. Image captioning1• Multiple inputs to single input, ie. Sentiment analysis• Multiple inputs to multiple outputs, ie. Machine translation, One
step prediction
©Andrej Karpathy
1Deep Visual-Semantic Alignments for Generating Image Descriptions31 / 41 [email protected],[email protected],[email protected] Deep Learning
History
• Early variants f(·) & g(·) single layer networks, ie. Elman /Jordan networks around 1990.• Trained with ordinary back propagation• Vanishing/Exploding gradient =⇒ hard to train long term
dependencies
©Christopher Olah
32 / 41 [email protected],[email protected],[email protected] Deep Learning
Long short term memory (LSTM)
Long short term memory (LSTM)2is designed to compensate for thevanishing gradient problem
st = gist−1 + (1− gi)sc
Essentially a weighted update
©Christopher Olah
2Long short term memory, 199733 / 41 [email protected],[email protected],[email protected] Deep Learning
The unreasonable effectiveness of RNN
• Blog entry by AndrejKarpathy• One step prediction• Trained on large corpus of
text
©Andrej Karpathy
34 / 41 [email protected],[email protected],[email protected] Deep Learning
Ex. Shakespeare
Trained on all works of Shakespeare ∼ 4MB data. Below is agenerated sample
PANDARUS:Alas, I think he shall be come approached and the day Whenlittle srain would be attain’d into being never fed, And who isbut a chain and subjects of his death, I should not sleep.
Second Senator:They are away this miseries, produced upon my soul,Breaking and strongly should be buried, when I perish Theearth and thoughts of many states.
35 / 41 [email protected],[email protected],[email protected] Deep Learning
More examples
Linux core source code/** If this error is set, we will need anything right after that BSD.*/static void action_new_function(struct s_stat_info *wb){
unsigned long flags;int lel_idx_bit = e->edd, *sys & ~((unsigned long) *FIRST_COMPAT);buf[0] = 0xFFFFFFFF & (bit << 4);min(inc, slist->bytes);printk(KERN_WARNING "Memory allocated %02x/%02x, ""original MLL instead\n"),min(min(multi_run - s->len, max) * num_data_in),frame_pos , sz + first_seg);div_u64_w(val, inb_p);spin_unlock(&disk->queue_lock);mutex_unlock(&s->sock->mutex);mutex_unlock(&func->mutex);return disassemble(info->pending_bh);
}
36 / 41 [email protected],[email protected],[email protected] Deep Learning
More examples
Latex code from math
37 / 41 [email protected],[email protected],[email protected] Deep Learning
Under the hood
As with other deep learning models it is hard to understand thefunction of all hidden states in the model
Visualize the activity of a particular state variable during a run
38 / 41 [email protected],[email protected],[email protected] Deep Learning
Visualizing the network
Rowlength
Inside quotation
Rawtext in program
39 / 41 [email protected],[email protected],[email protected] Deep Learning
Image captioning
Create an initial state with a convolutional neural network
Use the same technique to generate a sentence describing the image
©Andrej Karpathy and Li Fei-Fei
40 / 41 [email protected],[email protected],[email protected] Deep Learning
Thank you!
41 / 41 [email protected],[email protected],[email protected] Deep Learning