chapter 4 artificial neural networks. questions: what is anns? how to learn an ann? (algorithm) the...

42
Chapter 4 Artificial Neural Networks

Upload: nickolas-lawson

Post on 27-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Chapter 4

Artificial Neural Networks

Page 2: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Questions:

• What is ANNs?

• How to learn an ANN? (algorithm)

• The presentational power of ANNs(advantage and disadvantage)

Page 3: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

What is ANNs ------Background

Consider humans

• Neuron switching time 0.001 second

• Number of neurons 1010

• Connections per neuron 104~5

• Scene recognition time 0.1 second

much parallel computation

• Property of neuron: thresholded unit

One motivation for ANN systems is to capture this kind of highly parallel computation based on distributed reprensetation

Page 4: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

• classfication

• Voice recognition

• others

What is ANNs? -----Problems related to ANNs

Page 5: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Another example:

Properties of artificial neural nets (ANNs)

• Many neuron like threshold switching units

• Many weighted interconnections among units

• Highly parallel distributed process

• Emphasis on tuning weights automatically

Page 6: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

4.1 Perceptrons

> 00 1 2 211 ...( , ..., )1 1 otherwise

n nif x x xo x xn

Page 7: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

To simplify notation, set x0 =1

0( ) sgn( )nio x x xi i

�������������������������� ��

0 1( , , ..., )n

0 1( , , ..., )

nx x x x

Learning a perceptron involves choosing values for the weight . Therefore, the space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors.

0,..., n

1{ | }nH

R}

Page 8: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

We can view the perceptron as representing a hyperplane decision surface in the n-dimensional space of instances.

Two way to train perceptron:

Perceptron Training Rule and Gradient Descent

0( )

sgn( )

nio x xi i

x

����������������������������

Page 9: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

(1). Perceptron Training Rule

i i i ( )i it o x

• is target value

• o is perceptron output

• is small constant called learning rate

( )t x

•Initialize the ωi with random value in the given interval

•Update the value of ωi according to the training example

Page 10: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

• A single perceptron can be used to represent many boolean functions, such as AND, OR, NAND, NOR, but fail to represent XOR.

• Eg: g(x1, x2) = AND(x1 ,x2)

o(x1, x2) = sgn(- 0.8 + 0.5x1 + 0.5x2 )

Representation Power of Perceptrons

x1 x1 - 0.8 + 0.5x1 + 0.5x2 O

-1 -1 -1.8 -1

-1 1 -0.8 -1

1 -1 -0.8 -1

1 1 0.2 1

Page 11: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Representation Power of Perceptrons

(a) Can prove it will converge• If training data is linearly separable• and sufficiently small(b)But some functions not representable ,eg: not linearly separa

ble(c) Every boolean function can be represented by some network

of perceptrons only two levels deep

Page 12: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

(2). Gradient Descent

Key idea: searching the hypothesis space to find the weights that best fit the training examples.

Best fit: minimize the squared error

Where D is set of training examples

21( ) ( )

2 d dd D

E t o

Page 13: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

0 1

( ) ( , , , )n

E E EE

Gradient:

Training rule:

( )E

i i i

ii

E

or

Gradient Descent

Page 14: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

21( )

2

( ) ( )

( )( )

d dd Di i

d d d dd D i

d d idd D

Et o

t o t x

t o x

( )i d d idd D

t o x

i i i

ii

E

Gradient Descent

Page 15: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Gradient Descent Algorithm

Initialize each ωi to some small random value

• Until the termination condition is met , Do

– Initialize each Δωi to zero.

– For each <x, t> in training examples Do

• Input the instance x to the unit and compute the output o

• For each linear unit weight ωi Do

– For each linear unit weight ωi ,Do

i i ( ) it o x

i i i

Page 16: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

When to use gradient descent

• Continuously parameterized hypothesis

• The error can be differentiable

Page 17: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Advantage vs Disadvantage

Advantage

• Guaranteed to converge to hypothesis with

local minimum error , Given sufficiently small learning rate η;

• Even when training data contains noise;

• Training data not linear separable ;

• Converge to the single global minimum.

Disadvantage

• Converging sometimes can be very slow;

• No guarantee Converging to global minimum in cases where there are multiple local minima

Page 18: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Incremental (Stochastic) Gradient Descent

standard Gradient Descent

Do until satisfied

• Compute the gradient

Stochastic Gradient Descent

For each training example d in D

• Compute the gradient

( )DE

( )DE

( )dE

( )dE

21( ) ( )

2d d dE t o 21

( ) ( )2D d d

d D

E t o

Vs.

Page 19: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Standard Gradient Descent vs. Stochastic Gradient Descent

• Stochastic Gradient Descent can approximate Standard Gradient Descent arbitrarily closely if η made small enough;

• Stochastic mode can converge faster;

• Stochastic Gradient descent can sometimes avoid falling into local minima.

Page 20: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

(3).Perceptron training rule Vs. gradient descent

Perceptron training rule• Thresholded perceptron output: • Provided examples are linearly separable• Converge to a hypothesis that perfectly classfies the trainin

g data

gradient descent• Unthresholded linear output:• Regardless of whether the training data are linearly separa

ble • Converge asymptotically toward the minimum error hypot

hesis

( ) sgn( )o x x�������������������������� ��

( )o x x�������������������������� ��

Page 21: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

4.2 Multilayer Networks

Perceptron: Network:

Perceptrons can only express liner decision,we need to express a rich variety of nonlinear decision

Page 22: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Sigmoid unit – a differentiable threshold unit

1( ) ( 1)

1 kxx here k

e

( )( )(1 ( ))

d xx x

dx

1( ) ( ( ) )

1 neto net net x

e

Sigmoid function:

Property:

Output:

Why do we use sigmoid instead of linear and

sgn(x)?

Page 23: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

• computing the input and output of each unit foreword;

• modifying the weights of units pairs backward with respect to errors

The main idea of backpropagation algorithm

The Backpropagation Algorithm

Page 24: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

21( ) ( )

2D kd kdd D k outputs

E t o

21( ) ( )

2d kd kdk outputs

E t o

Error definition :

Batch mode:

Individual mode:

Page 25: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

ji

ji

j

j

j

x =the ith input to unit j

= the weight associated with the ith input to unit j

net (the weighted sum of inputs for unit j)

o = the output computed by unit j

t = the target output

ji jiix

for unit j

outputs =the set of units in the final layer

Ds(j) = the set ot units whose immediate

inputs include the output of j

oj

ω ij

oi = xji

… …

j net ji jiix

Page 26: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Training rule for Output Unit weights

jd d

j j j

oE E

net o net

21( ) ( )

2d

j j j jj j

Et o t o

o o

( )(1 )j j

j jj j

o neto o

net net

( ) (1 )dj j j j

j

Et o o o

net

( ) (1 )dji j j j j

ji

Et o o o

Page 27: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Training Rule for Hidden Unit Weights

( )

( )

( )

(

(

)

)

(1 )

d d k

k Ds jj k j

kk

k Ds j j

jk

k kj

kk Ds j j

j jk Ds j

j

jk kj

k Ds j j

E E net

net net net

net

net

o

onet

o net

o

ne

o

t

j k( )

(1 )

we have

j j kjk Ds j

and

o o

j d

j

Edenote

net

j jiji x

Error term

ok

Page 28: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Backpropagation Algorithm

• Initialize all weights to small random numbers

• Until termination condition is met Do

For each training example Do

//Propagate the input forward

1. Input the training example to the network and compute the network outputs

//Propagate the errors backward

2. For each output unit k

3. For each hidden unit h

4.Update each network weight

where

( ) (1 )k k k k kt o o o

h k(1 )h h khk outputs

o o

ji ji ji

ji j jix

Page 29: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Hidden layer Representations

Page 30: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Hidden layer Representations

Page 31: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Hidden layer Representations

Page 32: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Hidden layer Representations

Page 33: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Hidden layer Representations

Page 34: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Convergence and local minima

• Converge to some local minimum and not necessarily to the global minimum error

• Use stochastic gradient descent rather than the standard gradient descent

• Initialization will influence the convergence. Training multiple networks network with different initializing random weights,over the same data, then select the best one

• Training can take thousands of iterations -->slow

• Initialize weights near zero, Therefore initial networks near linear. Increasingly nonlinear functions is possible as training progresses

• Add a momentum term to speed convergence

j ji( ) ( 1) (0 1)ji jin x n

Page 35: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Expressive Capabilities of ANNs

• Every boolean function can be represented by network with single hidden layer

• Every bounded continuous function can be approximated with arbitrarily small error by network with one hidden layer

• Any function can be approximated to arbitrary accuracy by a network with two hidden layers

• The network with more hidden layers possibly results in the rise of precision , the possibility of converging to a local minima ,however, will increase as well.

Page 36: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

When to Consider Neural Networks

• Input is high dimensional discrete or real valued

• Output is discrete or real valued

• Output is a vector of values

• Possibly noisy data

• Form of target function is unknown

• Human readability of result is unimportant

Page 37: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Overfitting in ANNs

Page 38: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Strategy applied to avoid overfitting

• Poor strategy: continue training until the error falls below some threshold

• A good indicator : the number of iterations that produces the lowest error over the validation set

• Once the trained weights reach a significantly higher error over the validation set than the stored weights, terminate!

Page 39: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Alternative Error Functions

Page 40: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Recurrent Networks

Page 41: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Recurrent Networks

Page 42: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

Thank you !