![Page 1: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/1.jpg)
Neural Networksand
Backpropagation
1
10-‐601 Introduction to Machine Learning
Matt GormleyLecture 19
March 29, 2017
Machine Learning DepartmentSchool of Computer ScienceCarnegie Mellon University
Neural Net Readings:Murphy -‐-‐Bishop 5HTF 11Mitchell 4
![Page 2: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/2.jpg)
Reminders
• Homework 6: Unsupervised Learning– Release: Wed, Mar. 22– Due: Mon, Apr. 03 at 11:59pm
• Homework 5 (Part II): Peer Review– Release: Wed, Mar. 29– Due: Wed, Apr. 05 at 11:59pm
• Peer Tutoring
2
Expectation: You should spend at most 1 hour on your reviews
![Page 3: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/3.jpg)
Neural Networks Outline• Logistic Regression (Recap)
– Data, Model, Learning, Prediction• Neural Networks
– A Recipe for Machine Learning– Visual Notation for Neural Networks– Example: Logistic Regression Output Surface– 2-‐Layer Neural Network– 3-‐Layer Neural Network
• Neural Net Architectures– Objective Functions– Activation Functions
• Backpropagation– Basic Chain Rule (of calculus)– Chain Rule for Arbitrary Computation Graph– Backpropagation Algorithm– Module-‐based Automatic Differentiation (Autodiff)
3
![Page 4: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/4.jpg)
RECALL: LOGISTIC REGRESSION
4
![Page 5: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/5.jpg)
Using gradient ascent for linear classifiers
Key idea behind today’s lecture:1. Define a linear classifier (logistic regression)2. Define an objective function (likelihood)3. Optimize it with gradient descent to learn
parameters4. Predict the class with highest probability under
the model
5
![Page 6: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/6.jpg)
Using gradient ascent for linear classifiers
6
Use a differentiable function instead:
logistic(u) ≡ 11+ e−u
p�(y = 1| ) =1
1 + (��T )
This decision function isn’t differentiable:
sign(x)
h( ) = sign(�T )
![Page 7: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/7.jpg)
Using gradient ascent for linear classifiers
7
Use a differentiable function instead:
logistic(u) ≡ 11+ e−u
p�(y = 1| ) =1
1 + (��T )
This decision function isn’t differentiable:
sign(x)
h( ) = sign(�T )
![Page 8: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/8.jpg)
Logistic Regression
8
Learning: finds the parameters that minimize some objective function. �� = argmin
�J(�)
Data: Inputs are continuous vectors of length K. Outputs are discrete.
D = { (i), y(i)}Ni=1 where � RK and y � {0, 1}
Prediction: Output is the most probable class.y =
y�{0,1}p�(y| )
Model: Logistic function applied to dot product of parameters with input vector.
p�(y = 1| ) =1
1 + (��T )
![Page 9: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/9.jpg)
NEURAL NETWORKS
9
![Page 10: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/10.jpg)
A Recipe for Machine Learning
1. Given training data:
10
Background
2. Choose each of these:– Decision function
– Loss function
Face Face Not a face
Examples: Linear regression, Logistic regression, Neural Network
Examples: Mean-‐squared error, Cross Entropy
![Page 11: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/11.jpg)
A Recipe for Machine Learning
1. Given training data: 3. Define goal:
11
Background
2. Choose each of these:– Decision function
– Loss function
4. Train with SGD:(take small steps opposite the gradient)
![Page 12: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/12.jpg)
A Recipe for Machine Learning
1. Given training data: 3. Define goal:
12
Background
2. Choose each of these:– Decision function
– Loss function
4. Train with SGD:(take small steps opposite the gradient)
Gradients
Backpropagation can compute this gradient! And it’s a special case of a more general algorithm called reverse-‐mode automatic differentiation that can compute the gradient of any differentiable function efficiently!
![Page 13: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/13.jpg)
A Recipe for Machine Learning
1. Given training data: 3. Define goal:
13
Background
2. Choose each of these:– Decision function
– Loss function
4. Train with SGD:(take small steps opposite the gradient)
Goals for Today’s Lecture
1. Explore a new class of decision functions (Neural Networks)
2. Consider variants of this recipe for training
![Page 14: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/14.jpg)
Linear Regression
14
Decision Functions
…
Output
Input
θ1 θ2 θ3 θM
y = h�(x) = �(�T x)
where �(a) = a
![Page 15: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/15.jpg)
Logistic Regression
15
Decision Functions
…
Output
Input
θ1 θ2 θ3 θM
y = h�(x) = �(�T x)
where �(a) =1
1 + (�a)
![Page 16: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/16.jpg)
y = h�(x) = �(�T x)
where �(a) =1
1 + (�a)
Logistic Regression
16
Decision Functions
…
Output
Input
θ1 θ2 θ3 θM
Face Face Not a face
![Page 17: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/17.jpg)
y = h�(x) = �(�T x)
where �(a) =1
1 + (�a)
Logistic Regression
17
Decision Functions
…
Output
Input
θ1 θ2 θ3 θM
1 1 0
x1
x2
y
In-‐Class Example
![Page 18: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/18.jpg)
Perceptron
18
Decision Functions
…
Output
Input
θ1 θ2 θ3 θM
y = h�(x) = �(�T x)
where �(a) =1
1 + (�a)
![Page 19: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/19.jpg)
From Biological to Artificial
Biological “Model”• Neuron: an excitable cell• Synapse: connection between
neurons• A neuron sends an
electrochemical pulse along its synapseswhen a sufficient voltage change occurs
• Biological Neural Network: collection of neurons along some pathway through the brain
Artificial Model• Neuron: node in a directed acyclic
graph (DAG)• Weight: multiplier on each edge• Activation Function: nonlinear
thresholding function, which allows a neuron to “fire” when the input value is sufficiently high
• Artificial Neural Network: collection of neurons into a DAG, which define some differentiable function
19
Biological “Computation”• Neuron switching time : ~ 0.001 sec• Number of neurons: ~ 1010
• Connections per neuron: ~ 104-‐5
• Scene recognition time: ~ 0.1 sec
Artificial Computation• Many neuron-‐like threshold switching
units• Many weighted interconnections
among units• Highly parallel, distributed processes
Slide adapted from Eric Xing
The motivation for Artificial Neural Networks comes from biology…
![Page 20: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/20.jpg)
Logistic Regression
20
Decision Functions
…
Output
Input
θ1 θ2 θ3 θM
y = h�(x) = �(�T x)
where �(a) =1
1 + (�a)
![Page 21: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/21.jpg)
Neural Networks
Whiteboard– Example: Neural Network w/1 Hidden Layer– Example: Neural Network w/2 Hidden Layers– Example: Feed Forward Neural Network
21
![Page 22: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/22.jpg)
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.6
.5
.8
.2
.1
.3.7
.2
WeightsHidden Layer
“Probability of beingAlive”
0.6S
S
.4
.2S
Neural Network Model
© Eric Xing @ CMU, 2006-‐2011 22
![Page 23: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/23.jpg)
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.6
.5
.8
.1
.7
WeightsHidden Layer
“Probability of beingAlive”
0.6S
“Combined logistic models”
© Eric Xing @ CMU, 2006-‐2011 23
![Page 24: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/24.jpg)
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.5
.8.2
.3
.2
WeightsHidden Layer
“Probability of beingAlive”
0.6S
© Eric Xing @ CMU, 2006-‐2011 24
![Page 25: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/25.jpg)
© Eric Xing @ CMU, 2006-‐2011 25
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
1Gender
Stage 4
.6.5
.8.2
.1
.3.7
.2
WeightsHidden Layer
“Probability of beingAlive”
0.6S
![Page 26: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/26.jpg)
WeightsIndependent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.6
.5
.8
.2
.1
.3.7
.2
WeightsHidden Layer
“Probability of beingAlive”
0.6S
S
.4
.2S
Not really, no target for hidden units...
© Eric Xing @ CMU, 2006-‐2011 26
![Page 27: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/27.jpg)
Neural Network
28
Decision Functions
…
…
Output
Input
Hidden Layer
![Page 28: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/28.jpg)
Neural Network
29
Decision Functions
…
…
Output
Input
Hidden Layer
(F) LossJ = 1
2 (y � y(d))2
(E) Output (sigmoid)y = 1
1+ (�b)
(D) Output (linear)b =
�Dj=0 �jzj
(C) Hidden (sigmoid)zj = 1
1+ (�aj), �j
(B) Hidden (linear)aj =
�Mi=0 �jixi, �j
(A) InputGiven xi, �i
![Page 29: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/29.jpg)
NEURAL NETWORKS: REPRESENTATIONAL POWER
30
![Page 30: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/30.jpg)
Building a Neural Net
31
…
Output
Features
Q: How many hidden units, D, should we use?
![Page 31: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/31.jpg)
Building a Neural Net
…
…
Output
Input
Hidden LayerD = M
32
1 1 1
Q: How many hidden units, D, should we use?
![Page 32: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/32.jpg)
Building a Neural Net
…
…
Output
Input
Hidden LayerD = M
33
Q: How many hidden units, D, should we use?
![Page 33: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/33.jpg)
Building a Neural Net
…
…
Output
Input
Hidden LayerD = M
34
Q: How many hidden units, D, should we use?
![Page 34: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/34.jpg)
Building a Neural Net
…
…
Output
Input
Hidden LayerD < M
35
What method(s) is this setting similar to?
Q: How many hidden units, D, should we use?
![Page 35: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/35.jpg)
Building a Neural Net
…
…
Output
Input
Hidden LayerD > M
36
What method(s) is this setting similar to?
Q: How many hidden units, D, should we use?
![Page 36: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/36.jpg)
Deeper Networks
37
Decision Functions
…
…
Output
Input
Hidden Layer 1
Q: How many layers should we use?
![Page 37: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/37.jpg)
Deeper Networks
38
Decision Functions
…
…Input
Hidden Layer 1
…
Output
Hidden Layer 2
Q: How many layers should we use?
![Page 38: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/38.jpg)
Q: How many layers should we use?
Deeper Networks
39
Decision Functions
…
…Input
Hidden Layer 1
…Hidden Layer 2
…
Output
Hidden Layer 3
![Page 39: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/39.jpg)
Deeper Networks
40
Decision Functions
…
…
Output
Input
Hidden Layer 1
Q: How many layers should we use?• Theoretical answer:
– A neural network with 1 hidden layer is a universal function approximator
– Cybenko (1989): For any continuous function g(x), there exists a 1-‐hidden-‐layer neural net hθ(x) s.t. | hθ(x) – g(x) | < ϵ for all x, assuming sigmoid activation functions
• Empirical answer:– Before 2006: “Deep networks (e.g. 3 or more hidden layers)
are too hard to train”– After 2006: “Deep networks are easier to train than shallow
networks (e.g. 2 or fewer layers) for many problems”
Big caveat: You need to know and use the right tricks.
![Page 40: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/40.jpg)
Decision Boundary
• 0 hidden layers: linear classifier– Hyperplanes
y
1x 2x
Example from to Eric Postma via Jason Eisner 41
![Page 41: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/41.jpg)
Decision Boundary
• 1 hidden layer– Boundary of convex region (open or closed)
y
1x 2x
Example from to Eric Postma via Jason Eisner 42
![Page 42: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/42.jpg)
Decision Boundary
• 2 hidden layers– Combinations of convex regions
Example from to Eric Postma via Jason Eisner
y
1x 2x
43
![Page 43: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/43.jpg)
Different Levels of Abstraction
• We don’t know the “right” levels of abstraction
• So let the model figure it out!
44
Decision Functions
Example from Honglak Lee (NIPS 2010)
![Page 44: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/44.jpg)
Different Levels of Abstraction
Face Recognition:– Deep Network can build up increasingly higher levels of abstraction
– Lines, parts, regions
45
Decision Functions
Example from Honglak Lee (NIPS 2010)
![Page 45: CarnegieMellonUniversity NeuralNetworks and Backpropagation · 2019. 1. 11. · NeuralNetworks and Backpropagation 1 106601’Introduction’to’Machine’Learning Matt%Gormley Lecture%19](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff23c90a398d733d9654967/html5/thumbnails/45.jpg)
Different Levels of Abstraction
46
Decision Functions
Example from Honglak Lee (NIPS 2010)
…
…Input
Hidden Layer 1
…Hidden Layer 2
…
Output
Hidden Layer 3