logo classification iii lecturer: dr. bo yuan e-mail: [email protected]

33
LOGO Classification III Lecturer: Dr. Bo Yuan E-mail: [email protected]

Upload: sharon-atkins

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

LOGO

Classification III

Lecturer Dr Bo Yuan

E-mail yuanbsztsinghuaeducn

Overview

Artificial Neural Networks

2

Biological Motivation

3

1011 The number of neurons in the human brain

104 The average number of connections of each neuron

10-3 The fastest switching times of neurons

10-10 The switching speeds of computers

10-1 The time required to visually recognize your mother

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A

Using ANN to study and model biological learning processes

Group B

Obtaining highly effective machine learning algorithms regardless of how

closely these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Overview

Artificial Neural Networks

2

Biological Motivation

3

1011 The number of neurons in the human brain

104 The average number of connections of each neuron

10-3 The fastest switching times of neurons

10-10 The switching speeds of computers

10-1 The time required to visually recognize your mother

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A

Using ANN to study and model biological learning processes

Group B

Obtaining highly effective machine learning algorithms regardless of how

closely these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Biological Motivation

3

1011 The number of neurons in the human brain

104 The average number of connections of each neuron

10-3 The fastest switching times of neurons

10-10 The switching speeds of computers

10-1 The time required to visually recognize your mother

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A

Using ANN to study and model biological learning processes

Group B

Obtaining highly effective machine learning algorithms regardless of how

closely these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A

Using ANN to study and model biological learning processes

Group B

Obtaining highly effective machine learning algorithms regardless of how

closely these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifo

n

iii

0

01)( 110

1 otherwise

xwxwwifxxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output

0 0 -08 0

0 1 -03 0

1 0 -03 0

1 1 03 1

Input sum Output

0 0 -03 0

0 1 02 1

1 0 02 1

1 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Gradient Descent

10

2)(

2

1)(

Dd

dd otwE

nw

E

w

E

w

EwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Delta Rule

11

)()(

)()(

)()(22

1

)(2

1

)(2

1

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Do

ndash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Do

bull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Stochastic Learning NAND

14

InputTarget

Initial

Weights

Output

Error CorrectionFinal

WeightsIndividual SumFinal

Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 0

1 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 01

1 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 01

1 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 01

1 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 01

1 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 02

1 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 02

1 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 01

1 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

XOR

16

-

-

))(( pqqpqpqpqp

Input Output

0 0 0

0 1 1

1 0 1

1 1 0

Cannot be separated by a single line

+

+

p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Hidden Layer Representations

18

p q OR NAND AND

0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(

1

1)( yy

dy

yd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Training Rule for Output Units

21

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)(

)()(2

2

1

)(2

1 2

jj

j

jjjj

jjjj

d

ot

o

otot

otoo

E

)1()(

jjj

j

j

j oonet

net

net

o

)1()( jjjjj

d oootnet

E

jijjjjji

dji xooot

w

Ew )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oownet

ow

net

o

o

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

More about BP Networks hellip

Convergence and Local Minima

The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks

Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning

24

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

More about BP Networks hellip

Overfitting

Tend to occur during later iterations

Use validation dataset to terminate the training when necessary

Practical Considerations

Momentum

Adaptive learning ratebull Small slow convergence easy to get stuck

bull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Beyond BP Networks

26

Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints

Robot Driving

Character Recognition

Face Recognition

Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem

4 input samplesbull 0 0 0

bull 1 0 1

Task 2 Identity Function

8 input samplesbull 10000000 10000000

bull 00010000 00010000

bull hellip

Use 3 hidden units

Deliverables

Report

Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment