![Page 1: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/1.jpg)
AN INTRODUCTION TO NEURAL NETWORKS
Scott KuindersmaNovember 12, 2009
![Page 2: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/2.jpg)
![Page 3: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/3.jpg)
SUPERVISED LEARNING
• We are given some training data:
• We must learn a function
• If y is discrete, we call it classification
• If it is continuous, we call it regression
![Page 4: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/4.jpg)
ARTIFICIAL NEURAL NETWORKS
• Artificial neural networks are one technique that can be used to solve supervised learning problems
• Very loosely inspired by biological neural networks
• real neural networks are much more complicated, e.g. using spike timing to encode information
• Neural networks consist of layers of interconnected units
![Page 5: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/5.jpg)
PERCEPTRON UNIT
• The simplest computational neural unit is called a perceptron
• The input of a perceptron is a real vector x
• The output is either 1 or -1
• Therefore, a perceptron can be applied to binary classification problems
• Whether or not it will be useful depends on the problem... more on this later...
![Page 6: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/6.jpg)
PERCEPTRON UNIT[MITCHELL 1997]
![Page 7: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/7.jpg)
SIGN FUNCTION
![Page 8: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/8.jpg)
EXAMPLE
• Suppose we have a perceptron with 3 weights:
• On input x1 = 0.5, x2 = 0.0, the perceptron outputs:
• where x0 = 1
![Page 9: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/9.jpg)
LEARNING RULE• Now that we know how to calculate the output of a perceptron,
we would like to find a way to modify the weights to produce output that matches the training data
• This is accomplished via the perceptron learning rule
• for an input pair where, again, x0 = 1
• Loop through the training data until (nearly) all examples are classified correctly
![Page 10: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/10.jpg)
MATLAB EXAMPLE
![Page 11: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/11.jpg)
LIMITATIONS OF THE PERCEPTRON MODEL
• Can only distinguish between linearly separable classes of inputs
• Consider the following data:
![Page 12: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/12.jpg)
PERCEPTRONS AND BOOLEAN FUNCTIONS
• Suppose we let the values (1,-1) correspond to true and false, respectively
• Can we describe a perceptron capable of computing the AND function? What about OR? NAND? NOR? XOR?
• Let’s think about it geometrically
![Page 13: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/13.jpg)
BOOLEAN FUNCS CONT’DAND OR
NORNAND
![Page 14: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/14.jpg)
EXAMPLE: AND
• Let pAND(x1,x2) be the output of the perceptron with weights w0 = -0.3, w1 = 0.5, w2 = 0.5 on input x1, x2
x1 x2 pAND(x1,x2)
-1 -1 -1
-1 1 -1
1 -1 -1
1 1 1
![Page 15: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/15.jpg)
XOR
![Page 16: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/16.jpg)
XOR• XOR cannot be represented by a perceptron, but it can be
represented by a small network of perceptrons, e.g.,
AND
OR
x1
x2
NAND
x1
x2
![Page 17: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/17.jpg)
PERCEPTRON CONVERGENCE• The perceptron learning rule is not guaranteed to converge if the
data is not linearly separable
• We can remedy this situation by considering linear unit and applying gradient descent
• The linear unit is equivalent to a perceptron without the sign function. That is, its output is given by:
• where x0 = 1
![Page 18: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/18.jpg)
LEARNING RULE DERIVATION
• Goal: a weight update rule of the form
• First we define a suitable measure of error
• Typically we choose a quadratic function so we have a global minimum
![Page 19: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/19.jpg)
ERROR SURFACE [MITCHELL 1997]
![Page 20: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/20.jpg)
LEARNING RULE DERIVATION
• The learning algorithm should update each weight in the direction that minimizes the error according to our error function
• That is, the weight change should look something like
![Page 21: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/21.jpg)
GRADIENT DESCENT
![Page 22: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/22.jpg)
GRADIENT DESCENT
• Good: guaranteed to converge to the minimum error weight vector regardless of whether the training data are linearly separable (given that α is sufficiently small)
• Bad: still can only correctly classify linearly separable data
![Page 23: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/23.jpg)
NETWORKS
• In general, many-layered networks of threshold units are capable of representing a rich variety of nonlinear decision surfaces
• However, to use our gradient descent approach on multi-layered networks, we must avoid the non-differentiable sign function
• Multiple layers of linear units can still only represent linear functions
• Introducing the sigmoid function...
![Page 24: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/24.jpg)
SIGMOID FUNCTION
![Page 25: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/25.jpg)
SIGMOID UNIT [MITCHELL 1997]
![Page 26: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/26.jpg)
EXAMPLE
• Suppose we have a sigmoid unit k with 3 weights:
• On input x1 = 0.5, x2 = 0.0, the unit outputs:
![Page 27: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/27.jpg)
NETWORK OF SIGMOID UNITS
2 3 4
0 1
x0 x1 x2 x3
output layer
hidden layer
w02
w31
o2 o3 o4
![Page 28: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/28.jpg)
EXAMPLE
3
1 2
x0x1 x2
.1 .2
.30 -.2
3.2
.5 -.51.0
![Page 29: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/29.jpg)
EXAMPLE
3
1 2
x0x1 x2
.1 .2
.30 -.2
3.2
.5 -.51.0
−2
−1
0
1
2
−2−1.5−1−0.500.511.52
0.65
0.7
0.75
0.8
x1x2
output
![Page 30: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/30.jpg)
BACK-PROPAGATION
• Really just applying the same gradient descent approach to our network of sigmoid units
• We use the error function:
![Page 31: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/31.jpg)
BACKPROP ALGORITHM
![Page 32: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/32.jpg)
BACKPROP CONVERGENCE
• Unfortunately, there may exist many local minima in the error function
• Therefore we cannot guarantee convergence to an optimal solution as in the single linear unit case
• Time to convergence is also a concern
• Nevertheless, backprop does reasonably well in many cases
![Page 33: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/33.jpg)
MATLAB EXAMPLE
• Quadratic decision boundary
• Single linear unit vs. Three-sigmoid unit backprop network... GO!
![Page 34: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/34.jpg)
BACK TO ALVINN
• ALVINN was a 1989 project at CMU in which an autonomous vehicle learned to drive by watching a person drive
• ALVINN's architecture consists of a single hidden layer back-propagation network
• The input layer of the network is a 30x32 unit two dimensional "retina" which receives input from the vehicles video camera
• The output layer is a linear representation of the direction the vehicle should travel in order to keep the vehicle on the road
![Page 35: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/35.jpg)
ALVINN
![Page 36: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/36.jpg)
REPRESENTATIONAL POWER OF NEURAL NETWORKS
• Every boolean function can be represented by a network with two layers of units
• Every bounded continuous function can be approximated to arbitrarily accuracy by a two-layer network of sigmoid hidden units and linear output units
• Any function can be approximated to arbitrarily accuracy by a three layer network sigmoid hidden units and linear output units
![Page 37: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given](https://reader033.vdocuments.net/reader033/viewer/2022042219/5ec4d6ea87da536c513901de/html5/thumbnails/37.jpg)
READING SUGGESTIONS
• Mitchell, Machine Learning, Chapter 4
• Russell and Norvig, AI a Modern Approach, Chapter 20