radial basis-function networks. back-propagation stochastic back-propagation algorithm step by step...

34
Radial Basis-Function Networks

Post on 19-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Radial Basis-Function Networks

Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example

Radial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why does RBF network work

Back-propagation

The algorithm gives a prescription for changing the weights wij in any feed-forward network to learn a training set of input output pairs {xd,td}

We consider a simple two-layer network

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

xk

x1 x2 x3 x4 x5

Given the pattern xd the hidden unit j receives a net input

and produces the output

net jd = w jk

k=1

5

∑ xkd

V jd = f (net j

d ) = f ( w jk

k=1

5

∑ xkd )

Output unit i thus receives

And produce the final output

netid = W ij

j=1

3

∑ V jd = (W ij ⋅

j=1

3

∑ f ( w jk

k=1

5

∑ xkd ))

oid = f (neti

d ) = f ( W ij

j=1

3

∑ V jd ) = f ( (W ij ⋅

j=1

3

∑ f ( w jk

k=1

5

∑ xkd )))

In our example E becomes

E[w] is differentiable given f is differentiable Gradient descent can be applied

E[r w ] =

1

2(ti

d

i=1

2

∑d =1

m

∑ − oid )2

E[r w ] =

1

2(ti

d

i=1

2

∑d =1

m

∑ − f ( W ij

j

3

∑ ⋅ f ( w jk xkd

k=1

5

∑ )))2

Consider a network with M layers m=1,2,..,M

Vmi from the output of the ith unit of the

mth layer V0

i is a synonym for xi of the ith input Subscript m layers m’s layers, not

patterns Wm

ij mean connection from Vjm-1 to Vi

m

Stochastic Back-Propagation Algorithm (mostly used)

1. Initialize the weights to small random values

2. Choose a pattern xdk and apply is to the input layer V0

k= xdk for all k

3. Propagate the signal through the network

4. Compute the deltas for the output layer

5. Compute the deltas for the preceding layer for m=M,M-1,..2

6. Update all connections

7. Goto 2 and repeat for the next pattern

Vim = f (neti

m ) = f ( wijm

j

∑ V jm−1)

δiM = f '(neti

M )(tid −Vi

M )

δim−1 = f '(neti

m−1) w jim

j

∑ δ jm

Δwijm = ηδ i

mV jm−1

wijnew = wij

old + Δwij

Examplew1={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}

w2={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}

w3={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}

W1={w11=0.1,w12=0.1,w13=0.1}

W2={w11=0.1,w12=0.1,w13=0.1}

X1={1,1,0,0,0}; t1={1,0}

X2={0,0,0,1,1}; t1={0,1}

f (x) = σ (x) =1

1+ e(−x )

f '(x) = σ ' (x) = σ (x) ⋅(1−σ (x))

net11 = w1k

k=1

5

∑ xk1 V1

1 = f (net11) =

1

1+ e−net11

net21 = w2k

k=1

5

∑ xk1 V2

1 = f (net11) =

1

1+ e−net21

net31 = w3k

k=1

5

∑ xk1 V3

1 = f (net31 ) =

1

1+ e−net31

net11=1*0.1+1*0.1+0*0.1+0*0.1+0*0.1

V11=f(net1

1 )=1/(1+exp(-0.2))=0.54983

V12=f(net1

2 )=1/(1+exp(-0.2))=0.54983

V13=f(net1

3 )=1/(1+exp(-0.2))=0.54983

net11 = W1 j

j=1

3

∑ V j1 o1

1 = f (net11) =

1

1+ e−net11

net21 = W2 j

j=1

3

∑ V j1 o2

1 = f (net21 ) =

1

1+ e−net21

net11=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495

o11= f(net11)=1/(1+exp(- 0.16495))= 0.54114

net12=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495

o12= f(net11)=1/(1+exp(- 0.16495))= 0.54114

We will use stochastic gradient descent with =1

ΔW ij = η (tid − oi

d ) f '

d =1

m

∑ (netid ) ⋅V j

d

ΔW ij = (ti − oi) f '(neti)V j

f '(x) = σ ' (x) = σ (x) ⋅(1−σ (x))

ΔW ij = (ti − oi)σ (neti)(1−σ (neti))V j

δi = (ti − oi)σ (neti)(1−σ (neti))

ΔW ij = δiV j

δ1=(1- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= 0.11394

δ2=(0- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= -0.13437

δ1 = (t1 − o1)σ (net1)(1−σ (net1))

ΔW1 j = δ1V j

δ2 = (t2 − o2)σ (net2)(1−σ (net2))

ΔW2 j = δ2V j

Δw jk = δi

1

2

∑ ⋅W ij f '(net j ) ⋅ xk

Δw jk = δi

1

2

∑ ⋅W ijσ (net j )(1−σ (net j )) ⋅ xk

δ j = σ (net j )(1−σ (net j )) W ij

i=1

2

∑ δ i

Δw jk = δ j ⋅ xk

δ1= 1/(1+exp(- 0.2))*(1- 1/(1+exp(- 0.2)))*(0.1* 0.11394+0.1*( -0.13437))

δ1= -5.0568e-04

δ2= -5.0568e-04

δ3= -5.0568e-04

δ1 = σ (net1)(1−σ (net1)) W i1

i=1

2

∑ δ i

δ2 = σ (net2)(1−σ (net2)) W i2

i=1

2

∑ δ i

δ3 = σ (net3)(1−σ (net3)) W i3

i=1

2

∑ δ i

First Adaptation for x1

(one epoch, adaptation over all training patterns, in our case x1 x2)

δ1= -5.0568e-04 δ1= 0.11394

δ2= -5.0568e-04 δ2= -0.13437

δ3= -5.0568e-04

x1 =1 v1 =0.54983

x2 =1 v2 =0.54983

x3 =0 v3=0.54983

x4 =0

x5 =0

ΔW ij = δiV j

Δw jk = δ j ⋅ xk

Radial Basis-Function Networks RBF networks train rapidly No local minima problems No oscillation Universal approximators

Can approximate any continuous function Share this property with feed forward networks with

hidden layer of nonlinear neurons (units) Disadvantage

After training they are generally slower to use

Gaussian response function

Each hidden layer unit computes

x = an input vector u = weight vector of hidden layer neuron i

hi = e−Di

2

2σ 2

Di2 = (

r x −

r u i)

T (r x −

r u i)

The output neuron produces the linear weighted sum

The weights have to be adopted (LMS)

Δwi = η (t − o)x i€

o = wi ⋅hi

i= 0

n

The operation of the hidden layer

One dimensional input

h = e−(x−u)2

2σ 2

Two dimensional input

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Every hidden neuron has a receptive field defined by the basis-function x=u, maximum output Output for other values drops as x deviates from u Output has a significant response to the input x only

over a range of values of x called receptive field The size of the receptive field is defined by u may be called mean and standard deviation The function is radially symmetric around the

mean u

Location of centers u

The location of the receptive field is critical

Apply clustering to the training set each determined cluster center would

correspond to a center u of a receptive field of a hidden neuron

Determining The object is to cover the input space with

receptive fields as uniformly as possible If the spacing between centers is not uniform, it

may be necessary for each hidden layer neuron to have its own

For hidden layer neurons whose centers are widely separated from others, must be large enough to cover the gap

Following heuristic will perform well in practice For each hidden layer neuron, find the RMS

distance between ui and the center of its N nearest neighbors cj

Assign this value to i€

RMS =1

n⋅ uk −

c lk

l=1

N

∑N

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

2

i= k

n

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Why does a RBF network work?

The hidden layer applies a nonlinear transformation from the input space to the hidden space

In the hidden space a linear discrimination can be performed

( )

( )

( )( )( )

( )

( )( )

( )

( )

( )

( )( )

( )

( )( )

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example

Radial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why does RBF network work

Bibliography

Wasserman, P. D., Advanced Methods in Neural Computing, New York: Van Nostrand Reinhold, 1993

Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“ benötigt.

Support Vector Machines