a note about gradient descent: consider the function f(x)=(x-x 0 ) 2 its derivative is:

21
about gradient descent: er the function f(x)=(x-x 0 ) 2 rivative is: dient descent . x 0 ) ( 2 ) ( 0 x x dx x df dt dx x + - ) ( 2 ) ( 0 x x dx x df

Upload: colt-pennington

Post on 31-Dec-2015

38 views

Category:

Documents


2 download

DESCRIPTION

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. + -. x 0. Solving the differential equation:. or in the general form:. What is the solution of this type of equation:. Try:. THE PERCEPTRON: (Classification). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

A note about gradient descent:

Consider the function f(x)=(x-x0)2

Its derivative is:

By gradient descent .

x0

)(2)(

0xxdx

xdf

dt

dxx

+ -

)(2)(

0xxdx

xdf

Page 2: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

kxx

xxxxdx

xdf

dt

dxx

00 22)(2)(

Solving the differential equation:

or in the general form:

What is the solution of this type of equation:

CBtAtx )exp()(Try:

/)exp(/)0()( ktkxtx

Page 3: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

THE PERCEPTRON:(Classification)

00

01)()( 0 x

xxwherewxwO

i i Threshold unit:

where is the output for input pattern , are the synaptic weights and is the desired output

ox

iWy

o

1x

2x

3x

4x

5x

w1 w2 w3 w4 w5

x1 x2 y

1 1 1

1 0 0

0 1 0

0 0 0

AND

Page 4: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

o

1x

2x

x1 x2 y

1 1 1

1 0 0

0 1 0

0 0 0

AND

-1.5

1 1

0 1

1

05.121 xx

Linearly seprable

Page 5: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

o

1x

2x

x1 x2 y

1 1 1

1 0 1

0 1 1

0 0 0

OR

-0.5

1 1

0 1

1

05.021 xx

Linearly separable

Page 6: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Perceptron learning rule:

o

2x

3x

4x

5x

w1 w2 w3 w4 w5

ii xW

oy

)(Convergence proof:

Hertz, Krough, Palmer (HKP) - did you receive the email?

Assignment 3a:program in matlab a preceptronwith a perceptron learning ruleand solve the OR, AND and XOR problems. (Due before Feb 27)

Show Demo

Page 7: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Summary – what can perceptrons do and how?

Page 8: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Linear single layer network: ( approximation, curve fitting)

0wxwO ii i Linear unit:

where is the output for input pattern , are the synaptic weights and is the desired output

o xiW

y

o

1x

2x

3x

4x

5x

w1 w2 w3 w4 w5

Minimize mean square error:

2

1

P

oyE

wxo or *

Page 9: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Linear single layer network: ( approximation, curve fitting)

0WxWO ii i Linear unit:

where is the output for input pattern , are the synaptic weights and is the desired output

ox

iWy

o

1x

2x

3x

4x

5x

w1 w2 w3 w4 w5

Minimize mean square error:

2

1

P

oyE

Page 10: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

The best solution is obtained when E is minimal.

For linear neurons there is an exact solution for this called the pseudo-inverse (see HKP).

Looking for a solution by gradient descent:

2

1

P

oyE

E

w

-gradient

ii

i

W

O

O

E

W

E

dt

dW

Chain rule

Page 11: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

2

1

P

oyE

iii

i xoyW

O

O

E

W

E

dt

dW

2

0WxWOi i

and

i

iii

i W

xW

W

O

)(

Since:

Error:

ii xoyW Therefore:

Which types of problems can a linear network solve?

Page 12: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

)(2 '

oxoydt

dWi

i

)( 0WxWOi i Sigmoidal neurons:

Which types of problems can a sigmoidal networks solve?

Assignment 3b – Implement a one layer linear and sigmoidal network, fit a 1D a linear, a sigmoid and a quadratic function, for both networks.

forexample:

)exp(1

)exp(),(

x

xx

Page 13: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Multi layer networks:

5x

k jjjkk xwwo )( 1

,122

• Can solve non linearly separable classification problems.

• Can approximate any arbitrary function, given ‘enough’ units in the hidden layer.

Hidden layer

1x

2x

1,1o

4x

1,2o

o

Output layer

Input layer3x

Page 14: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

k j

jjkk xwwo )( 1,

2 o

1x

ix

1,1o

1,2o

Nx

11,1w

11,iw

11,2w

12,iw

11,Nw

21w

22w

12,Nw

Note: is not a vector but a matrix

1, jiw

Page 15: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Solving linearly inseparable problems

k j

jjkk xwwo )( 1,

2

1x

2x

1,1o

1,2o

x1 x2 y

1 1 0

1 0 1

0 1 1

0 0 0

XOR

Hint: XOR = or and not and

Page 16: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

How do we learn a multi-layer networkThe credit assignment problem !

k j

jjkk xwwo )( 1,

2

.5

1x

2x

1,1o 0

-.5

1,2o

x1 x2 y

1 1 0

1 0 1

0 1 1

0 0 0

XOR

0.5 -0.5 1 -1

1 0.5

Page 17: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

k j

jjkk xwwo )( 1,

2

Gradient descent/ Back Propagation, the solution to the credit assignment problem:

P

oyE1

2

2

1

Where:

1,2

2k

kk o

w

EwFrom hidden layer

to output weights:

{kh

Page 18: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

)(2k

kk hwo

P

oyE1

2

2

1

Where:

For input to hidden layer:

1,

1,

1,

jijiji w

o

o

E

w

Ew

)( oyo

E

jjiji

xhww

o)('2

1,

j

jjkk xwh 1

,

and

jijiiji xxhwoyw )()( '21

,

jiji xw 1,

2' )( iii whand

{

Page 19: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

For input to hidden layer:

jiji xw 1,

2' )( iii whand

o

1x

ix

1,1o

1,2o

Nx

11,1w

11,iw

11,2w

12,iw

11,Nw

21w

22w

12,Nw

Assignment 3c: Program a 2 layer network in matlab, solve the XOR problem. Fit the curve: x(x-1) between 0 and 1, how many hidden units did you need?

Page 20: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

Formal neural networks can accomplish many tasks, for example:

• Perform complex classification

•Learn arbitrary functions

•Account for associative memorySome applications: Robotics, Character recognition, Speech recognition,Medical diagnostics.

This is not Neuroscience, but is motivated loosely by neuroscience and carries important information for neuroscience as well.

For example: Memory, learning and some aspects of development are assumed to be based on synaptic plasticity.

Page 21: A note about gradient descent: Consider the function  f(x)=(x-x 0 ) 2 Its derivative is:

What did we learn today?

Is BackProp biologically realistic?