machine learning for signal processing neural networks ...mlsp.cs.cmu.edu › courses › fall2016...
TRANSCRIPT
![Page 1: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/1.jpg)
Machine Learning for Signal Processing
Neural Networks Continue
Instructor: Bhiksha Raj
Slides by Najim Dehak
1 Dec 2016
1
![Page 2: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/2.jpg)
So what are neural networks??
• What are these boxes?
N.Net Voice signal Transcription N.Net Image Text caption
N.Net Game State Next move
18797/11755 2
![Page 3: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/3.jpg)
So what are neural networks??
• It began with this..
• Humans are very good at the tasks we just saw
• Can we model the human brain/ human intelligence?
– An old question – dating back to Plato and Aristotle.. 18797/11755 3
![Page 4: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/4.jpg)
MLP - Recap
• MLPs are Boolean machines – They represent Boolean functions over linear boundaries – They can represent arbitrary boundaries
• Perceptrons are correlation filters – They detect patterns in the input
• MLPs are Boolean formulae over patterns detected by perceptron – Higher-level perceptrons may also be viewed as feature detectors
• MLPs are universal approximators
– Can model any function to arbitrary precision
• Extra: MLP in classification – The network will fire if the combination of the detected basic features
matches an “acceptable” pattern for a desired class of signal • E.g. Appropriate combinations of (Nose, Eyes, Eyebrows, Cheek, Chin) Face
4
![Page 5: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/5.jpg)
MLP - Recap
• MLPs are Boolean machines
– They represent arbitrary Boolean functions over arbitrary linear boundaries
• Perceptrons are pattern detectors
– MLPs are Boolean formulae over these patterns
• MLPs are universal approximators
– Can model any function to arbitrary precision
• MLPs are very hard to train
– Training data are generally many orders of magnitude too few
– Even with optimal architectures, we could get rubbish
– Depth helps greatly!
– Can learn functions that regular classifiers cannot 5
![Page 6: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/6.jpg)
What is a deep network?
![Page 7: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/7.jpg)
Deep Structures
• In any directed network of computational
elements with input source nodes and output
sink nodes, “depth” is the length of the
longest path from a source to a sink
• Left: Depth = 2. Right: Depth = 3
![Page 8: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/8.jpg)
Deep Structures
• Layered deep structure
• “Deep” Depth > 2
![Page 9: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/9.jpg)
MLP as a continuous-valued regression
• MLPs can actually compose arbitrary functions to arbitrary precision
– Not just classification/Boolean functions
• 1D example
– Left: A net with a pair of units can create a pulse of any width at any location
– Right: A network of N such pairs approximates the function with N scaled pulses 9
x
1 T1
T2
1
T1
T2
1
-1 T1 T2 x
f(x) x
+
![Page 10: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/10.jpg)
MLP features
• The lowest layers of a network detect significant features in the signal
• The signal could be reconstructed using these features
– Will retain all the significant components of the signal 10
DIGIT OR NOT?
![Page 11: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/11.jpg)
Making it explicit: an autoencoder
• A neural network can be trained to predict the input itself
• This is an autoencoder
• An encoder learns to detect all the most significant patterns in the signals
• A decoder recomposes the signal from the patterns 11
𝑿
𝒀
𝑿
𝑾
𝑾𝑻
![Page 12: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/12.jpg)
Deep Autoencoder
ENCODER
DECODER
![Page 13: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/13.jpg)
What does the AE learn
• In the absence of an intermediate non-linearity
• This is just PCA 13
𝑿
𝑿
𝒀 𝑾
𝑾𝑻
𝐘 = 𝐖𝐗 𝐗 = 𝐖𝑇𝐘 𝐸 = 𝐗 −𝐖𝑇𝐖𝐗 2 Find W to minimize Avg[E]
![Page 14: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/14.jpg)
The AE
• With non-linearity
– “Non linear” PCA
– Deeper networks can capture more complicated
manifolds 14
ENCODER
DECODER
![Page 15: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/15.jpg)
The Decoder:
• The decoder represents a source-specific generative
dictionary
• Exciting it will produce typical signals from the source!
15
DECODER
![Page 16: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/16.jpg)
The AE
ENCODER
DECODER
Cut the AE
16
![Page 17: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/17.jpg)
DECODER
The Decoder:
• The decoder represents a source-specific generative
dictionary
• Exciting it will produce typical signals from the source!
17
Sax dictionary
![Page 18: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/18.jpg)
The Decoder:
• The decoder represents a source-specific generative
dictionary
• Exciting it will produce typical signals from the source!
18
DECODER
Clarinet dictionary
![Page 19: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/19.jpg)
NN for speech enhancement
19
![Page 20: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/20.jpg)
Story so far
• MLPs are universal classifiers
– They can model any decision boundary
• Neural networks are universal approximators
– They can model any regression
• The decoder of MLP autoencoders represent
a non-linear constructive dictionary!
20
![Page 21: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/21.jpg)
The need for shift invariance
• In many problems the location of a pattern is not important
– Only the presence of the pattern
• Conventional MLPs are sensitive to the location of the pattern
– Moving it by one component results in an entirely different input that the MLP wont recognize
• Requirement: Network must be shift invariant
=
![Page 22: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/22.jpg)
History
Yann LeCun
Hubel and Wiesel: 1959 (biological model), Fukushima: 1980 (computational model), Altas: 1988, Lecunn: 1989 (Backprop in convnets)
Kunihiko Fukushima
Convolutional Neural Networks
![Page 23: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/23.jpg)
Convolutional Neural Networks • A special kind of multi-layer neural networks.
• Implicitly extract relevant features.
• A feed-forward network that can extract topological
properties from an image.
• CNNs are also trained with a version of back-propagation algorithm.
![Page 24: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/24.jpg)
All different weights
Convolution layer has much smaller number of parameters by local connection and weight sharing
All different weights Shared weights
Connectivity & weight sharing
![Page 25: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/25.jpg)
25
Example: 200x200 image 40K hidden units ~2B parameters!!!
- Spatial correlation is local - Waste of resources + we have not enough training samples anyway..
Fully Connected Layer
Ranzato
![Page 26: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/26.jpg)
26
Locally Connected Layer
Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters
Ranzato
Note: This parameterization is good when input image is registered (e.g., face recognition).
![Page 27: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/27.jpg)
27
STATIONARITY? Statistics is similar at different locations
Ranzato
Locally Connected Layer
Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters
![Page 28: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/28.jpg)
28
Convolutional Layer
Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels
Ranzato
![Page 29: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/29.jpg)
Convolution
![Page 30: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/30.jpg)
Convolutional Layer
Ranzato
![Page 31: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/31.jpg)
Ranzato
Convolutional Layer
![Page 32: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/32.jpg)
Ranzato
Convolutional Layer
![Page 33: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/33.jpg)
Ranzato
Convolutional Layer
![Page 34: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/34.jpg)
Ranzato
Convolutional Layer
![Page 35: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/35.jpg)
Ranzato
Convolutional Layer
![Page 36: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/36.jpg)
Ranzato
Convolutional Layer
![Page 37: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/37.jpg)
Ranzato
Convolutional Layer
![Page 38: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/38.jpg)
Ranzato
Convolutional Layer
![Page 39: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/39.jpg)
Ranzato
Convolutional Layer
![Page 40: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/40.jpg)
Ranzato
Convolutional Layer
![Page 41: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/41.jpg)
Ranzato
Convolutional Layer
![Page 42: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/42.jpg)
Ranzato
Convolutional Layer
![Page 43: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/43.jpg)
Ranzato
Convolutional Layer
![Page 44: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/44.jpg)
Ranzato
Convolutional Layer
![Page 45: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/45.jpg)
Ranzato
Convolutional Layer
![Page 46: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/46.jpg)
46
Learn multiple filters.
E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters
Ranzato
Convolutional Layer
![Page 47: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/47.jpg)
before:
now:
input layer hidden layer
output layer
Convolutional Layers
![Page 48: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/48.jpg)
32
32
3
32x32x3 image
width
height
depth
Convolution Layer
![Page 49: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/49.jpg)
32
32
3
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
Convolution Layer
![Page 50: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/50.jpg)
32
32
3
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
Filters always extend the full
depth of the input volume
Convolution Layer
![Page 51: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/51.jpg)
32
32
3
32x32x3 image
5x5x3 filter
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e. 5*5*3 = 75-dimensional dot product + bias)
Convolution Layer
![Page 52: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/52.jpg)
32
32
3
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation map
1
28
28
Convolution Layer
![Page 53: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/53.jpg)
32
32
3
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation maps
1
28
28
consider a second, green filter
Convolution Layer
![Page 54: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/54.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We stack these up to get a “new image” of size 28x28x6!
Convolution Layer
![Page 55: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/55.jpg)
Preview: ConvNet is a sequence of Convolution Layers, interspersed with
activation functions
32
32
3
28
28
6
CONV,
ReLU
e.g. 6
5x5x3
filters
CNN
![Page 56: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/56.jpg)
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with
activation functions
32
32
3
CONV,
ReLU
e.g. 6
5x5x3
filters 28
28
6
CONV,
ReLU
e.g. 10
5x5x6
filters
CONV,
ReLU
….
10
24
24
CNN
![Page 57: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/57.jpg)
57
Let us assume filter is an “eye” detector. Q.: how can we make the detection robust to the exact location of the eye?
Ranzato
Pooling Layer
![Page 58: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/58.jpg)
58
By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features.
Ranzato
Pooling Layer
![Page 59: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/59.jpg)
- makes the representations smaller and more manageable
- operates over each activation map independently:
Pooling Layer
![Page 60: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/60.jpg)
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
Single depth slice
x
y
max pool with 2x2 filters
and stride 2 6 8
3 4
Max Pooling
![Page 61: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/61.jpg)
61
Convol. Pooling
One stage (zoom)
courtesy of K. Kavukcuoglu Ranzato
ConvNets: Typical Stage
![Page 62: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/62.jpg)
Digit classification
![Page 63: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/63.jpg)
ImageNet • 1.2 million high-resolution images from ImageNet LSVRC-2010 contest
• 1000 different classes (sofmax layer)
• NN configuration • NN contains 60 million parameters and 650,000 neurons, • 5 convolutional layers, some of which are followed by max-pooling layers • 3 fully-connected layers
Krizhevsky, A., Sutskever, I. and Hinton, G. E. “ImageNet Classification with Deep Convolutional
Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada
![Page 64: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/64.jpg)
ImageNet
Figure 3: 96 convolutional
kernels of size 11×11×3
learned by the first
convolutional layer on the
224×224×3 input images. The
top 48 kernels were learned
on GPU 1 while the bottom 48
kernels were learned on GPU
2. See Section 6.1 for details.
Krizhevsky, A., Sutskever, I. and Hinton, G. E. “ImageNet Classification with Deep Convolutional
Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada
![Page 65: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/65.jpg)
ImageNet
Krizhevsky, A., Sutskever, I. and Hinton, G. E. “ImageNet Classification with Deep Convolutional
Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada
Eight ILSVRC-2010 test images and the five
labels considered most probable by our model.
The correct label is written under each image,
and the probability assigned to the correct label
is also shown with a red bar (if it happens to be
in the top 5).
Five ILSVRC-2010 test images in the first
column. The remaining columns show the six
training images that produce feature vectors in
the last hidden layer with the smallest Euclidean
distance from the feature vector for the test
image.
![Page 66: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/66.jpg)
CNN for Automatic Speech Recognition
• Convolution over frequencies
• Convolution over time
![Page 67: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/67.jpg)
• Neural network with specialized connectivity
structure
• Feed-forward:
- Convolve input
- Non-linearity (rectified linear)
- Pooling (local max)
• Supervised training
• Train convolutional filters by back-propagating error
• Convolution over time
• Adding memory to classical MLP network
• Recurrent neural network
Feature maps
Pooling
Non-linearity
Convolution (Learned)
Input image
CNN-Recap
![Page 68: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/68.jpg)
Recurrent networks introduce (RNN) cycles and a notion of time.
• They are designed to process sequences of data 𝑥1, … , 𝑥𝑛 and can produce sequences of outputs 𝑦1, … , 𝑦𝑚.
Recurrent Neural Networks (RNNs)
𝑥𝑡 𝑦𝑡
ℎ𝑡 ℎ𝑡−1
One-step delay
Recurrent Neural Network
![Page 69: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/69.jpg)
Elman Nets (1990) – Simple Recurrent Neural Networks
• Elman nets are feed forward networks with partial recurrence
• Unlike feed forward nets, Elman nets have a memory or sense of time
• Can also be viewed as a “Markovian” NN
![Page 70: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/70.jpg)
(Vanilla) Recurrent Neural Network
The state consists of a single “hidden” vector h:
𝑥𝑡 𝑦𝑡
ℎ𝑡 ℎ𝑡−1
One-step delay
Simple Recurrent Neural Network
![Page 71: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/71.jpg)
RNNs can be unrolled across multiple time steps. This produces a DAG which supports backpropagation. But its size depends on the input sequence length.
Unrolling RNNs
𝑥𝑡 𝑦𝑡
ℎ𝑡 ℎ𝑡−1
One-step delay
𝑥0
𝑦0
ℎ0
𝑥1
𝑦1
ℎ1
𝑥2
𝑦2
ℎ2
Recurrent Neural Network
![Page 72: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/72.jpg)
• Recurrent networks have one more or more feedback loops
• There are many tasks that require learning a temporal sequence of events – Speech, video, Text, Market
• These problems can be broken into 3 distinct types of tasks
1. Sequence Recognition: Produce a particular output pattern when a specific input sequence is seen. Applications: speech recognition
2. Sequence Reproduction: Generate the rest of a sequence when the network sees only part of the sequence. Applications: Time series prediction (stock market, sun spots, etc)
3. Temporal Association: Produce a particular output sequence in response to a specific input sequence. Applications: speech generation
Learning time sequences
![Page 73: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/73.jpg)
Often layers are stacked vertically (deep RNNs):
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Same parameters at this level
Same parameters at this level
Recurrent Neural Network
![Page 74: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/74.jpg)
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network Backprop still works: (it called Backpropagation Through Time)
![Page 75: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/75.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network
![Page 76: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/76.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network
![Page 77: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/77.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network
![Page 78: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/78.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network
![Page 79: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/79.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network
![Page 80: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/80.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Activations
Recurrent Neural Network
![Page 81: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/81.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network
![Page 82: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/82.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network
![Page 83: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/83.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network
![Page 84: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/84.jpg)
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network Backprop still works:
![Page 85: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/85.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network
![Page 86: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/86.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 𝑦12
ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network
![Page 87: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/87.jpg)
Backprop still works:
RNN structure
𝑥0
𝑦00 ℎ00
𝑥1
𝑦01 ℎ01
𝑥2
𝑦02 ℎ02
𝑥00 𝑥01 𝑥02
𝑦10 𝑦11 ℎ10 ℎ11 ℎ12
Time
Abstraction - Higher
level features
Gradients
Recurrent Neural Network
𝑦12
![Page 88: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/88.jpg)
The memory problem with RNN • RNN models signal context
• If very long context is used -> RNNs become unable to learn the context information
![Page 89: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/89.jpg)
Standard RNNs to LSTM
Standard
LSTM
![Page 90: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/90.jpg)
LSTM illustrated: input and forming new memory
Input gate
New memory
LSTM cell takes the following input
• the input 𝑥𝑡
• past memory output ℎ𝑡−1
• past memory 𝐶𝑡−1
(all vectors)
Forget gate
Cell state
![Page 91: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/91.jpg)
• Forming the output of the cell by using output gate
LSTM illustrated: Output
Overall picture:
![Page 92: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/92.jpg)
LSTM Equations
92
• 𝑖 = 𝜎 𝑥𝑡𝑈𝑖 + 𝑠𝑡−1𝑊
𝑖
• 𝑓 = 𝜎 𝑥𝑡𝑈𝑓 + 𝑠𝑡−1𝑊
𝑓
• 𝑜 = 𝜎 𝑥𝑡𝑈𝑜 + 𝑠𝑡−1𝑊
𝑜
• 𝑔 = tanh 𝑥𝑡𝑈𝑔 + 𝑠𝑡−1𝑊
𝑔
• 𝑐𝑡 = 𝑐𝑡−1 ∘ 𝑓 + 𝑔 ∘ 𝑖
• 𝑠𝑡 = tanh 𝑐𝑡 ∘ 𝑜
• 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
• 𝒊: input gate, how much of the new
information will be let through the memory
cell.
• 𝒇: forget gate, responsible for information
should be thrown away from memory cell.
• 𝒐: output gate, how much of the information
will be passed to expose to the next time
step.
• 𝒈: self-recurrent which is equal to standard
RNN
• 𝒄𝒕: internal memory of the memory cell
• 𝒔𝒕: hidden state
• 𝐲: final output
LSTM Memory Cell
![Page 93: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/93.jpg)
LSTM output synchronization
![Page 94: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/94.jpg)
(NLP) Applications of RNNs
• Section overview
– Language Model
– Sentiment analysis / text classification
– Machine translation and conversation modeling
– Sentence skip-thought vectors
![Page 95: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/95.jpg)
RNN for
![Page 96: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/96.jpg)
Sentiment analysis / text classification
• A quick example, to see the idea.
• Given text collections and their labels. Predict labels for unseen texts.
![Page 97: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/97.jpg)
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan, Huijun Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko North American Chapter of the Association for Computational Linguistics, Denver, Colorado, June 2015.
![Page 98: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/98.jpg)
![Page 99: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/99.jpg)
Composing music with RNN
http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/
![Page 100: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/100.jpg)
CNN-LSTM-DNN for speech recognition
• Ensembles of RNN/LSTM, DNN, & Conv Nets (CNN) give huge gains (state of the art):
• T. Sainath, O. Vinyals, A. Senior, H. Sak. “Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks,” ICASSP 2015.
![Page 101: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/101.jpg)
The Impact of deep learning in speech technologies
Cortana
![Page 102: Machine Learning for Signal Processing Neural Networks ...mlsp.cs.cmu.edu › courses › fall2016 › slides › Lecture23.DNN_part2_CMU.pdf– Training data are generally many orders](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1cf24cbc326d395f26d183/html5/thumbnails/102.jpg)
Conclusions
• MLPs are Boolean machines – They represent Boolean functions over linear boundaries – They can represent arbitrary boundaries
• Perceptrons are correlation filters – They detect patterns in the input
• MLPs are Boolean formulae over patterns detected by perceptron – Higher-level perceptrons may also be viewed as feature detectors
• MLPs are universal approximators
– Can model any function to arbitrary precision
– Non linear PCA
• Convolute NN can handle shift invariance – CNN
• Special NN can model sequential data
– RNN, LSTM