an introduction to neural networks - · pdf fileneural networks in order to combine ... alvinn...

56
An Introduction To Neural Networks

Upload: duongnguyet

Post on 24-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

An Introduction To Neural Networks

Page 2: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Neural Networks

In order to combine the powers of the machine and the human brain, Neural Networks try to mimic the structure and function of our nervous system.

Page 3: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Biological Motivation #1

Synapses

Axon

Dendrites

Synapses+++--

(weights)

Nodes

Page 4: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Biological Neural Systems

  Neuron switching time : > 10-3 secs   Connections (synapses) per neuron : ~104–105   Number of neurons in the human brain: ~1010

  Face recognition : 0.1 secs   High degree of parallel computation   Distributed representations

Page 5: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Biological Motivation #2

Page 6: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Appropriate Problem Domains for Neural Network (backprop) Learning

  Input is high-dimensional discrete or real-valued (e.g. raw sensor input)

  Output is discrete or real valued   Output is a vector of values   Possibly noisy data   Form of target function is unknown   Fast evaluation may be required   Humans do not need to interpret the results (black box

model)

Page 7: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Threshold T Output y

Input x1

Input x2

Input x3

Input x4

Weight w1

Weight w2

Weight w3

Weight w4

If w1x1 + w2x2 + … + wnxn ≥ T,

then the output of n is 1.

Otherwise,

the output of n is 0.

A Single Perceptron

Page 8: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Linearly Separable x1 x2 x1 and x2

0 0 0

0 1 0

1 0 0

1 1 1

x1

x2

x1 x2 x1 or x2

0 0 0

0 1 1

1 0 1

1 1 1

x1

x2

x1 x2 x1 xor x2

0 0 0

0 1 1

1 0 1

1 1 0

x1

x2

Page 9: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Perceptrons

  1969 book by Marvin Minsky and Seymour Papert

  The problem is that they can only work for classification problems that are linearly separable

  Insufficiently expressive   “Important research problem” to investigate

multilayer networks although they were pessimistic about their value

Page 10: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Perceptrons - another views

Page 11: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

T = 2 Output y

Input x1

Input x2

W1 = 1

W2 = 1

AND

Inputs are either 0 or 1

Output is 1 only if all inputs are 1

Page 12: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Output y

Input x1

Input x2

W1 = ?

W2 = ?

AND

Inputs are either 0 or 1

Input x0

W0 = ?

Page 13: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Output y

Input x1

Input x2

W1 = 0.5

W2 = 0.5

AND

Inputs are either 0 or 1

Input x0

W0 = -0.8

Page 14: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Training

 Train a perceptron to respond to certain inputs with certain desired outputs

 After training, the perceptron should give reasonable outputs for any input

  If it wasn’t trained for that input, it should try to find the best possible output depending on how it was trained

Page 15: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Perceptron Training Rule

  Begin with random weights   Apply the perceptron to each training example

(each pass through examples is called an epoch)

  If it misclassifies an example, modify the weights   Continue until the perceptron classifies all

training examples correctly

Page 16: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Modifying the Weights

wi ← wi + ∆wi

∆wi = LearningRate(DesiredOutput – ActualOutput)xi

Usually set to some small value like 0.1.

Moderates the degree to which the weights are changed at each step.

Keeps it from overshooting.

Page 17: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Modifying the Weights

wi ← wi + ∆wi

∆wi = LearningRate(DesiredOutput – ActualOutput)xi

This is the difference between what we wanted the output to be and what it actually was.

If the desired and actual are equal, then this is 0 and the weight won’t change.

Page 18: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Modifying the Weights

wi ← wi + ∆wi

∆wi = LearningRate(DesiredOutput – ActualOutput)xi

The value of the input itself.

If this value was 0, then it had no impact on the error, and so its weight shouldn’t be adjusted.

Page 19: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

EXAMPLE

  Begin with random weights   Apply the perceptron to each training example

(each pass through examples is called an epoch)

  If it misclassifies an example, modify the weights   wi = wi + LearningRate(DesiredOutput – ActualOutput)xi

  Continue until the perceptron classifies all training examples correctly

Page 20: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Gradient Descent Learning Rule   Consider linear unit without threshold and

continuous output o (not just 0,1)  o=w0 + w1 x1 + … + wn xn

  Train the wi’s such that they minimize the squared error

 E[w1,…,wn] = ½ Σd∈D (td-od)2

where D is the set of training examples

Page 21: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Perceptrons - another views

Page 22: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

(w1,w2)

(w1+Δw1,w2 +Δw2)

Gradient Descent

Page 23: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 24: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 25: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Incremental Stochastic Gradient Descent

  Batch mode : gradient descent w=w - η ∇ED[w] over the entire data D

ED[w]=1/2Σd(td-od)2

  Incremental mode: gradient descent w=w - η ∇Ed[w] over individual training examples d Ed[w]=1/2 (td-od)2

Incremental Gradient Descent can approximate Batch Gradient

Descent arbitrarily closely if η is small enough

Page 26: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Comparison: Perceptron and Gradient Descent Rule

Perceptron learning rule guaranteed to succeed (perfectly classifying training examples) if

  Training examples are linearly separable   Sufficiently small learning rate η Linear unit training rules using gradient descent   Guaranteed to converge to hypothesis with minimum squared error   Given sufficiently small learning rate η   Even when training data contains noise   Even when training data not separable by H

Page 27: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Restaurant Problem: Will I wait for a table?

  Alternate – whether there is a suitable alternative restaurant nearby

  Bar – whether the restaurant has a comfortable bar area to wait in   Fri/Sat – true on Fridays and Saturdays   Hungry – whether we are hungry   Patrons – how many people are in the restaurant (None, Some or

Full)   Price – the restaurants price range ($, $$, $$$)   Raining – whether its is raining outside   Reservation – whether we made a reservation   Type – the kind of restaurant (French, Italian, Thai, or Burger)   WaitEstimate – the wait estimate by the host (0-10 minutes, 10-30,

30-60, > 60)

Page 28: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Multilayer Network

Page 29: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 30: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 31: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

A compromise function   Perceptron

  Linear

  Sigmoid (Logistic)

output = net = wixii=0

n

output =σ (net) =1

1+ e−net

output =1 if wixi > 0

i=0

n

∑0 else

#

$ %

& %

Page 32: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Learning in Multilayer Networks

 Same method as for Perceptrons  Example inputs are presented to the

network   If the network computes an output that

matches the desired, nothing is done   If there is an error, then the weights are

adjusted to balance the error

Page 33: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 34: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

BackPropagation Learning

Page 35: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Alternative Error Measures

Page 36: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Neural Network Model Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

2 Gender

Stage 4

.6

.5

.8

.2

.1

.3 .7

.2

Weights HiddenLayer

“Probability of beingAlive”

0.6 Σ

Σ

.4

.2 Σ

Page 37: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Getting an answer from a NN Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

2 Gender

Stage 4

.6

.5

.8

.1

.7

Weights HiddenLayer

“Probability of beingAlive”

0.6 Σ

Page 38: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

2 Gender

Stage 4

.5

.8

.2

.3

.2

Weights HiddenLayer

“Probability of beingAlive”

0.6 Σ

Getting an answer from a NN

Page 39: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Getting an answer from a NN Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

1 Gender

Stage 4

.6

.5

.8

.2

.1

.3 .7

.2

Weights HiddenLayer

“Probability of beingAlive”

0.6 Σ

Page 40: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Minimizing the Error

w initial w trained

initial error

final error

Error surface

positive change

negative derivative

local minimum

Page 41: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Representational Power (FFNN)

 Boolean functions  2 layers of units

 Continuous functions  2 layers of units (sigmoid then linear)

 Arbitrary functions  3 layers of units (sigmoids then linear)

Page 42: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Hypothesis Space and Inductive Bias

Page 43: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Hidden Layer Representations

Page 44: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Hidden Layer Representations

Page 45: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 46: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 47: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 48: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Overfitting

Page 49: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Handwritten Character Recognition

  Le Cun et al. (1989) implemented a neural network to read zip codes on hand-addressed envelopes, for sorting purposes

  To identify the digits, uses a 16x16 array of pixels as input, 3 hidden layers, and a distributed output encoding with 10 output units for digits 0-9

  256 input nodes, 10 output units (1 for the liklihood of each number)

Page 50: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 51: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt
Page 52: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

ALVINN Drives 70 mph on a public highway

Camera image

30x32 pixels as inputs

30 outputs for steering 30x32 weights

into one out of four hidden unit

4 hidden units

Page 53: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Neural Nets for Face Recognition

Page 54: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Learning Hidden Unit Weights

Page 55: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Interpreting Satellite Imagery for Automated Weather Forecasting

Page 56: An Introduction To Neural Networks - · PDF fileNeural Networks In order to combine ... ALVINN Drives 70 mph on a public highway Camera image 30x32 pixels ... 151-16-NeuralNetworks.ppt

Summary

 Perceptrons, one layer networks, are insufficiently expressive

 Multi-layer networks are sufficiently expressive and can be trained by error back-propogation

 Many applications including speech, driving, hand written character recognition, fraud detection, driving, etc.