deep learning - machine learning in der medizin · 2018-12-17 · why deep learning peter...
TRANSCRIPT
![Page 1: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/1.jpg)
Deep LearningMachine Learning in der Medizin
Asan Agibetov, [email protected]
Medical University of ViennaCenter for Medical Statistics, Informatics and Intelligent Systems
Institute for Artificial Intelligence and Decision SupportSpitalgasse 23, 1090 Vienna, BT88.04.808
December 13, 2018
![Page 2: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/2.jpg)
Introduction
▶ References (available online for free):▶ ”Neural Networks and Deep Learning”. Michael A. Nielsen,
Determination Press, 2015▶ intuition first, math after
▶ ”Deep Learning”. Ian Goodfellow, Youshua Bengio and AaronCourville, MIT Press, 2016
▶ formal with fair amount of intuition, more general than Nielsen
▶ These slides are based on DL courses:▶ Course notes ”CNN for Visual Recognition” (Stanford, Spring
2017)▶ Course notes ”An introduction to Deep Learning”,
Marc’Aurelio Ranzato (Facebook AI Research), DeepLearnSummer School - Bilbao, 17-21 July 2017
![Page 3: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/3.jpg)
Why Deep Learning
Peter Norvig’s 1 recollection on Geoff Hinton’s 2 talk onBoltzmann Machine 3 work (back in 1980)
1. Cognitive plausibility in terms of a model of the brain2. Model that learns from experiences rather that programmed by
hand3. Continuous representations rather than Boolean, as in
traditional symbolic expert systems
1Research Director at Google, co-author of classical texts on AI2Professor at University of Toronto, one of the pioneers of Deep Learning3Boltzmann Machine (and Probabilistic Graphical Models) one of the
theoretical foundations for generative DL models
![Page 4: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/4.jpg)
Neural networks and Deep Learning
▶ Neural networks - biologically-inspired programming paradigm▶ enables computer to learn from observational data▶ universal function approximation machine 4
▶ Deep learning - powerful set of techniques for learning inneural networks
▶ harness GPU resources to parallelize and speed upmatrix-vector computations
▶ give rise to modularized approach to learning
4Hornik, ”Approximation capabilities of Multilayer Feedforward Networks”,Neural Networks, 1991
![Page 5: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/5.jpg)
Deep Learning - what’s in the name?▶ DL, roughly speaking, is NN with many layers and many
neurons in each layer▶ not true in all cases though (e.g., embeddings are often
shallow)
Figure 1: Simple and Deep NNs (image credit 5)
5https://hackernoon.com/log-analytics-with-deep-learning-and-machine-learning-20a1891ff70e
![Page 6: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/6.jpg)
Hierarchical feature learning▶ DL learns features automatically, and hierarchically
Figure 2: (Convolutional) Neural Network to detect a face 6
6credit ”Michael A. Nielsen”
![Page 7: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/7.jpg)
Hierarchical feature learning (cont.)
▶ Learnt features can be combined
Figure 3: Further decomposition of learnt features 7
7credit ”Michael A. Nielsen”
![Page 8: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/8.jpg)
Neural networks
Figure 4: 2 hidden layer network/4 layer network (+ input, output) 8
▶ Universal function approximation that maps input to output▶ f : Rn 7→ Rm
▶ Class of functions considered to map input to output▶ composition of simpler (including non-linear 9) functions
▶ h1 is non-linear max(0, W · x⃗ + b⃗) aka ReLU▶ f = o ◦ h2x ◦ h1 ◦ x
8image credit M-A. Ranzato (Facebook AI Research)9composition of only linear function would be equivalent to one linear
function
![Page 9: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/9.jpg)
Forward propagation
Figure 5: Forward pass on the network
▶ x ∈ RD, W1 ∈ RN1×D
▶ b1 ∈ RN1 , h1 ∈ RN1
h1 = max(0, W1 · x⃗ + b⃗)
▶ W1 1-st layer weight matrix or weights▶ b⃗1 1-st layer biases
![Page 10: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/10.jpg)
Why non linear layers▶ ReLU layers provide piece-wise linear tiling▶ # planes grows exponentially w. # hidden units▶ Multiple layers yield exponential savings in # parameters
(parameter sharing)
Figure 6: with ReLU mapping is locally linear 10
10Montufar et al. ”On the number of linear regions of DNNs”, arXiv, 2014
![Page 11: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/11.jpg)
How good is the network: task-dependant loss function Vi
▶ regression: MSE (mean squared error)▶ V1(y, f) = (y − f(x))2
▶ classification: variants of Cross-Entropy loss▶ class (category) index k ∈ 1 . . . C▶ predicted classes
▶ f(x) = [10 0 . . .
k1 . . .
C0], f(x)k = 1
▶ true classes▶ y = [
11 0 . . .
k0 . . .
C0], yk = 0
▶ probability that x belongs to class ck
▶ p(ck = 1|⃗x) = ef(x)k∑C1
ef(x)
▶ loss function with log-likelihoods (easier to optimize)▶ V2(y, f) = −
∑k yk log p(ck|x)
![Page 12: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/12.jpg)
Optimization: finding the best fTypical setup for optimization
▶ f can be parameterized with Θ (f = Θ · x linear case)▶ minimizing (learning) the loss function V over all training
examples 1 . . . n▶ plus regularizations on:
▶ λ2(f) - controls complexity of the function (usually norm ∥f∥)▶ λ1(f, Θ) - sparsity of the solution, where Θ parameters of f
f∗ = argminf
=n∑1
V(y, f(x)) + λ2(f) + λ1(f, Θ)
▶ to find f∗ you need to minimize complicated function▶ backpropagation gives the gradients of that complicated
function
![Page 13: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/13.jpg)
Recap
▶ Neural nets - chain (composition) of non-linear operations,implementing highly non-linear functions
▶ Forward pass computes error between the currently learntmapping function and the actual output
▶ Backward pass computes gradients w.r.t. inputs at each layerand parameters
▶ Optimization (minimization of the loss error) done bystochastic gradient descent (or variants of it)
![Page 14: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/14.jpg)
Computation: speed up and parallelize with GPUIn a nutshell DL is all about matrix multiplication
Figure 7: Matrix-matrix multiplication 11
▶ Entries of the A × C matrix can be computed in parallel withGPU
▶ A × B rows and B × C cols loaded in the shared memory
11image credit: Course notes ”CNN for Visual Recognition” (Stanford, Spring 2017)
![Page 15: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/15.jpg)
Function composition and computational graph
f(x, y, z) =n∑i
x1y1 + z1...
xnyn + zn
in vector notation.
=n∑i
(x ⊗ y + z) ⊗Hadamard product, elementwise multiplication
=n∑i
(a + z) a = x ⊗ y
=n∑i
b b = a + z
= c c =n∑i
b
![Page 16: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/16.jpg)
Function composition and computational graph (contd.)
f(x, y, z) =n∑i
x1y1 + z1...
xnyn + zn
Figure 8: computational graph with numpy 12
12image credit: Course notes ”CNN for Visual Recognition” (Stanford, Spring 2017)
![Page 17: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/17.jpg)
Gradients of function composition
∇xf = ∇x
n∑i
x1y1 + z1...
xnyn + zn
=
∂f∂xi...∂f
∂xn
,∂f∂xi
= ∂f∂c
∂c∂b
∂b∂a
∂a∂xi
∇xf = y∇yf = x∇zf = 1
![Page 18: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/18.jpg)
Gradients of function composition (contd.)▶ Cons of using numpy only:
▶ Manual computation of gradients for all f▶ No GPU support
Figure 9: computational graph and gradients with numpy 13
13image credit: Course notes ”CNN for Visual Recognition” (Stanford, Spring 2017)
![Page 19: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/19.jpg)
Deep Learning frameworks▶ Goals:
1. Easily build big computation graphs2. Easily compute gradients in computational graphs (automatic
gradient computation)3. Run it all efficiently on GPU (wrap low level NVIDIA and
Linear Algebra libraries (e.g., cuDNN, cuBLAS))▶ Academia/Industry open source frameworks
▶ Caffe (UC Berkeley) 7→ Caffe2 (Facebook)▶ Torch (NYU/Facebook) 7→ PyTorch (Facebook)▶ Theano (U Montreal) 7→ TensorFlow (Google)
▶ Industry (not necessarily open source) frameworks▶ Paddle (Baidu), CNTK (Microsoft), MXNet (Amazon), and
others...▶ High-level frameworks
▶ Keras (Theano, TensorFlow or CNTK as backend)▶ good for beginners
![Page 20: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/20.jpg)
DL frameworks comparison
Figure 10: Computational graph definition in numpy, pytorch andtensorflow 15
15image credit: Course notes ”CNN for Visual Recognition” (Stanford, Spring 2017)
![Page 21: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/21.jpg)
DL frameworks: Demo
![Page 22: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/22.jpg)
Deep Learning for Vision
Figure 11: fully connected layer forvisual recognition (image credit RanzatoFAIR)
▶ Idea:▶ unwrap images (2d matrices) into
1d vectors▶ R200×200 7→ R40000
▶ feed them into Neural Networks(fully connected layers)
▶ Problem:▶ spatial correlation is local▶ waste of resources▶ not robust to transformations
(scale, rotation, translation)
![Page 23: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/23.jpg)
Convolutional Layer▶ shared weights across the whole image▶ convolution takes advantage of
▶ stationarity (similar statistics at different locations)▶ local spatial correlation
Figure 12: convolutional layer for visual recognition(image credit Ranzato FAIR)
Figure 13: convolutions with learnt kernels
![Page 24: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/24.jpg)
Convolutional layer activations
Receptive field (aka. filter, kernel)Swipes through input and outputsactivation maps
activation maps - results of convolutions (sum of element-wisemultiplications)
positive activation null activation
![Page 25: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/25.jpg)
Multiple convolutional filters
hnj = max(0,
K∑k=1
hn−1k ∗ wn
kj)
Figure 14: multiple convolutional filters for visualrecognition (image credit Ranzato FAIR)
Figure 15: one convolution layer
![Page 26: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/26.jpg)
Pooling layer
Figure 16: pooling layer (image credit RanzatoFAIR)
▶ Pooling layer goal: spatialrobustness for featureextraction
▶ Assume our filter is eyedectector
▶ Pooling layer makes eyedetector robust to exactlocation of eye
![Page 27: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/27.jpg)
Pooling layer (contd.)
hnj (x, y) = maxx∈N(x),y∈N(y)hn−1
j (x, y)
Figure 17: pooling layer (image credit RanzatoFAIR)
▶ by pooling (e.g., takingmax) filter responses atdifferent locations
▶ we gain robustness to theexact spatial location offeatures
![Page 28: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/28.jpg)
ConvNets architecture
Figure 18: LeCun et al. ”Gradient based learning applied to documentrecognition” IEEE 1988
![Page 29: Deep Learning - Machine Learning in der Medizin · 2018-12-17 · Why Deep Learning Peter Norvig’s 1 recollection on Geoff Hinton’s2 talk on Boltzmann Machine 3 work (back in](https://reader033.vdocuments.net/reader033/viewer/2022060207/5f03ea137e708231d40b64c9/html5/thumbnails/29.jpg)
DL for vision Demo