an introduction to neural networksgrupen/403/slides/neuralnetworks.pdf · an introduction to neural...

37
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

AN INTRODUCTION TO NEURAL NETWORKS

Scott KuindersmaNovember 12, 2009

Page 2: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given
Page 3: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

SUPERVISED LEARNING

• We are given some training data:

• We must learn a function

• If y is discrete, we call it classification

• If it is continuous, we call it regression

Page 4: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

ARTIFICIAL NEURAL NETWORKS

• Artificial neural networks are one technique that can be used to solve supervised learning problems

• Very loosely inspired by biological neural networks

• real neural networks are much more complicated, e.g. using spike timing to encode information

• Neural networks consist of layers of interconnected units

Page 5: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

PERCEPTRON UNIT

• The simplest computational neural unit is called a perceptron

• The input of a perceptron is a real vector x

• The output is either 1 or -1

• Therefore, a perceptron can be applied to binary classification problems

• Whether or not it will be useful depends on the problem... more on this later...

Page 6: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

PERCEPTRON UNIT[MITCHELL 1997]

Page 7: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

SIGN FUNCTION

Page 8: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

EXAMPLE

• Suppose we have a perceptron with 3 weights:

• On input x1 = 0.5, x2 = 0.0, the perceptron outputs:

• where x0 = 1

Page 9: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

LEARNING RULE• Now that we know how to calculate the output of a perceptron,

we would like to find a way to modify the weights to produce output that matches the training data

• This is accomplished via the perceptron learning rule

• for an input pair where, again, x0 = 1

• Loop through the training data until (nearly) all examples are classified correctly

Page 10: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

MATLAB EXAMPLE

Page 11: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

LIMITATIONS OF THE PERCEPTRON MODEL

• Can only distinguish between linearly separable classes of inputs

• Consider the following data:

Page 12: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

PERCEPTRONS AND BOOLEAN FUNCTIONS

• Suppose we let the values (1,-1) correspond to true and false, respectively

• Can we describe a perceptron capable of computing the AND function? What about OR? NAND? NOR? XOR?

• Let’s think about it geometrically

Page 13: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

BOOLEAN FUNCS CONT’DAND OR

NORNAND

Page 14: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

EXAMPLE: AND

• Let pAND(x1,x2) be the output of the perceptron with weights w0 = -0.3, w1 = 0.5, w2 = 0.5 on input x1, x2

x1 x2 pAND(x1,x2)

-1 -1 -1

-1 1 -1

1 -1 -1

1 1 1

Page 15: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

XOR

Page 16: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

XOR• XOR cannot be represented by a perceptron, but it can be

represented by a small network of perceptrons, e.g.,

AND

OR

x1

x2

NAND

x1

x2

Page 17: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

PERCEPTRON CONVERGENCE• The perceptron learning rule is not guaranteed to converge if the

data is not linearly separable

• We can remedy this situation by considering linear unit and applying gradient descent

• The linear unit is equivalent to a perceptron without the sign function. That is, its output is given by:

• where x0 = 1

Page 18: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

LEARNING RULE DERIVATION

• Goal: a weight update rule of the form

• First we define a suitable measure of error

• Typically we choose a quadratic function so we have a global minimum

Page 19: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

ERROR SURFACE [MITCHELL 1997]

Page 20: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

LEARNING RULE DERIVATION

• The learning algorithm should update each weight in the direction that minimizes the error according to our error function

• That is, the weight change should look something like

Page 21: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

GRADIENT DESCENT

Page 22: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

GRADIENT DESCENT

• Good: guaranteed to converge to the minimum error weight vector regardless of whether the training data are linearly separable (given that α is sufficiently small)

• Bad: still can only correctly classify linearly separable data

Page 23: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

NETWORKS

• In general, many-layered networks of threshold units are capable of representing a rich variety of nonlinear decision surfaces

• However, to use our gradient descent approach on multi-layered networks, we must avoid the non-differentiable sign function

• Multiple layers of linear units can still only represent linear functions

• Introducing the sigmoid function...

Page 24: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

SIGMOID FUNCTION

Page 25: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

SIGMOID UNIT [MITCHELL 1997]

Page 26: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

EXAMPLE

• Suppose we have a sigmoid unit k with 3 weights:

• On input x1 = 0.5, x2 = 0.0, the unit outputs:

Page 27: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

NETWORK OF SIGMOID UNITS

2 3 4

0 1

x0 x1 x2 x3

output layer

hidden layer

w02

w31

o2 o3 o4

Page 28: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

EXAMPLE

3

1 2

x0x1 x2

.1 .2

.30 -.2

3.2

.5 -.51.0

Page 29: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

EXAMPLE

3

1 2

x0x1 x2

.1 .2

.30 -.2

3.2

.5 -.51.0

−2

−1

0

1

2

−2−1.5−1−0.500.511.52

0.65

0.7

0.75

0.8

x1x2

output

Page 30: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

BACK-PROPAGATION

• Really just applying the same gradient descent approach to our network of sigmoid units

• We use the error function:

Page 31: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

BACKPROP ALGORITHM

Page 32: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

BACKPROP CONVERGENCE

• Unfortunately, there may exist many local minima in the error function

• Therefore we cannot guarantee convergence to an optimal solution as in the single linear unit case

• Time to convergence is also a concern

• Nevertheless, backprop does reasonably well in many cases

Page 33: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

MATLAB EXAMPLE

• Quadratic decision boundary

• Single linear unit vs. Three-sigmoid unit backprop network... GO!

Page 34: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

BACK TO ALVINN

• ALVINN was a 1989 project at CMU in which an autonomous vehicle learned to drive by watching a person drive

• ALVINN's architecture consists of a single hidden layer back-propagation network

• The input layer of the network is a 30x32 unit two dimensional "retina" which receives input from the vehicles video camera

• The output layer is a linear representation of the direction the vehicle should travel in order to keep the vehicle on the road

Page 35: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

ALVINN

Page 36: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

REPRESENTATIONAL POWER OF NEURAL NETWORKS

• Every boolean function can be represented by a network with two layers of units

• Every bounded continuous function can be approximated to arbitrarily accuracy by a two-layer network of sigmoid hidden units and linear output units

• Any function can be approximated to arbitrarily accuracy by a three layer network sigmoid hidden units and linear output units

Page 37: AN INTRODUCTION TO NEURAL NETWORKSgrupen/403/slides/NeuralNetworks.pdf · AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009. SUPERVISED LEARNING •We are given

READING SUGGESTIONS

• Mitchell, Machine Learning, Chapter 4

• Russell and Norvig, AI a Modern Approach, Chapter 20