an introduction to deep learning
TRANSCRIPT
![Page 1: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/1.jpg)
An introduction to Deep Learning
![Page 2: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/2.jpg)
Who am I?
• David Rostcheck
• I am a data science consultant
• Follow my articles on LinkedIn
![Page 3: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/3.jpg)
DEEP LEARNING
![Page 4: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/4.jpg)
in some tests, Deep Learning has already shown abilities at the same level as humans
![Page 5: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/5.jpg)
These include: • computers that understand natural
language• autonomous vehicles • programs that can identify what is
occurring in a video
![Page 6: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/6.jpg)
It’s notable that
these solutions to diverse problems
in very different fields
use the same powerful technology
![Page 7: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/7.jpg)
NEURAL NET
![Page 8: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/8.jpg)
a neural net is a
simulation
of the brain,
a mathematical abstraction
![Page 9: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/9.jpg)
in the real brain,
the neurons send signals withfre cuen cies
not discrete signals
![Page 10: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/10.jpg)
tools exist that try to simulate the brain in a way that’s
more accurate
to the real brain
![Page 11: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/11.jpg)
Example: Numenta NuPIC, a type of Hierarchical Temporal Memory (HTM)
![Page 12: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/12.jpg)
but the techniques of neural nets
are sufficient
to deliver results
similar or better than humans
in specific cognitive tests
![Page 13: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/13.jpg)
therefore:
Deep Learning
what is it?
![Page 14: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/14.jpg)
common point of view:
a with
neural distinct
net levels
is correct, but…
![Page 15: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/15.jpg)
there is another point of view,
maybe more useful,
that we are going to present here
![Page 16: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/16.jpg)
it comes from Vincent Vanhoucke, Principal Research Scientist at Google.
the following comes from
his course on Deep
Learning, on Udacity
![Page 17: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/17.jpg)
He thinks about Deep Learning as
a framework for calculating
linear and almost linear
equations in an efficient way
![Page 18: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/18.jpg)
to develop this framework,
we are going to construct a
classifier
the simplest (and worst)
possible
![Page 19: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/19.jpg)
but wait a minute…
why
a classifier?
![Page 20: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/20.jpg)
Because classification (or more generally prediction) is a central technique in Machine Learning
with this, we can achieve ranking, regression, detection, reinforcement learning, and more…
![Page 21: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/21.jpg)
we start with a linear equation, in vector form…
![Page 22: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/22.jpg)
Think about constructing a simple classifier to predict, for each occurrence of X, which is:
![Page 23: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/23.jpg)
to do this, we must learn the values of W and b
![Page 24: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/24.jpg)
Does it work well?
![Page 25: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/25.jpg)
No.
It’s the worst.
![Page 26: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/26.jpg)
Why?
![Page 27: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/27.jpg)
there are two problems…
![Page 28: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/28.jpg)
No. 1:
it gives values,
and what we want
are probabilities
![Page 29: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/29.jpg)
we can fix it with the“softmax” function:
![Page 30: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/30.jpg)
we express the correct values in a vector of values 1 (correct) and 0 (the others).
we call this“one-hot encoding”
![Page 31: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/31.jpg)
to evaluate errors, we compare the probabilities with the correct values
![Page 32: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/32.jpg)
using what we call“cross-entropy”
![Page 33: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/33.jpg)
better, but…
there remains the second problem:
our equation is linear
and doesn’t represent non-linear equations well
![Page 34: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/34.jpg)
![Page 35: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/35.jpg)
this problem killed the perceptron (single level neural net)
![Page 36: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/36.jpg)
it doesn’t help to just add levels to the network
because we can represent whatever combination of linear operations as another linear operation – we can reduce the new network to another WX + b with the same problem
![Page 37: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/37.jpg)
What do we do?
![Page 38: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/38.jpg)
without another option,
we have to introduce non-linear
functions
logistic function
![Page 39: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/39.jpg)
but it’s expensive to calculate – we can use a simplified approximation called a “Rectified Linear Unit” , o ReLU
![Page 40: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/40.jpg)
now we can construct our neural net, in a way that’s efficient to calculate
![Page 41: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/41.jpg)
we can express this in a modular way, with a series of linear or almost linear operations with a matrix ... that allows us to us the power of a GPU
![Page 42: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/42.jpg)
this is good, but we are still lacking something…
to improve our estimation, we must minimize the error,
and this requires us to calculate the derivative of the function
![Page 43: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/43.jpg)
think about the chain rule of calculus:
d f(x) = d du f(x)dx du dx
![Page 44: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/44.jpg)
that can convert a derivative into a product (of other derivatives):
![Page 45: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/45.jpg)
that fits in our modular framework
![Page 46: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/46.jpg)
now we have it! a general, modular framework that incorporates everything we need!
![Page 47: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/47.jpg)
and we can construct deep neural nets, adding more levels as we need them
…but wait a minute:
why do we like deep networks?
![Page 48: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/48.jpg)
the most interesting problems,
like language and vision,
have very complex rules
we need a lot of parameters to represent them
![Page 49: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/49.jpg)
yes, but why don’t we use wider networks?
why is it better to have deep ones?
![Page 50: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/50.jpg)
are more efficient and better capture the structure inherent in many problems
![Page 51: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/51.jpg)
CONVNETS
![Page 52: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/52.jpg)
the convolutional network, or convnet,
transforms the input
so that the translation
of the input does not matter
we use it for visual recognition
![Page 53: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/53.jpg)
Let’s start with a photo:
![Page 54: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/54.jpg)
We use a region (kernel) of a photo like an input to another small neural net, with K as the output
![Page 55: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/55.jpg)
we slice the window across the photo
![Page 56: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/56.jpg)
this transforms the photo into another new one, with K color channels, and different dimensions
![Page 57: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/57.jpg)
this operation is called
a convolution
![Page 58: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/58.jpg)
if the region (the “kernel”) has
the same size as the original,
what did we obtain?
?
![Page 59: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/59.jpg)
in this case,
we recover the original photo
![Page 60: An introduction to Deep Learning](https://reader034.vdocuments.net/reader034/viewer/2022051707/58ecc55a1a28abc90b8b46a9/html5/thumbnails/60.jpg)
Questions?
?Contact: [email protected], twitter: @davidrostcheckArticles: http://linkedin.com/in/davidrostcheck