cs623: introduction to computing with neural nets (lecture-2)

36
CS623: Introduction to Computing with Neural Nets (lecture-2) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Upload: ghazi

Post on 04-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay. CS623: Introduction to Computing with Neural Nets (lecture-2). Recapituation. The human brain. Insula: Intuition & empathy. Pre-frontal cortex: Self control. Amygdala: Aggresion. Hippocampus: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS623: Introduction to Computing with Neural Nets (lecture-2)

CS623: Introduction to Computing with Neural Nets

(lecture-2)

Pushpak BhattacharyyaComputer Science and Engineering

DepartmentIIT Bombay

Page 2: CS623: Introduction to Computing with Neural Nets (lecture-2)

Recapituation

Page 3: CS623: Introduction to Computing with Neural Nets (lecture-2)
Page 4: CS623: Introduction to Computing with Neural Nets (lecture-2)
Page 5: CS623: Introduction to Computing with Neural Nets (lecture-2)

The human brain

Pre-frontal cortex: Self control

Insula:Intuition &empathy

Anterior CingulateCortex:Anxiety

Hippocampus:Emotional memory

Hypothalamus:Control of HormonesPituitary gland:

Mother instincts

Amygdala:Aggresion

Page 6: CS623: Introduction to Computing with Neural Nets (lecture-2)

Functional Map of the Brain

Page 7: CS623: Introduction to Computing with Neural Nets (lecture-2)

The Perceptron Model

A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron.

Output = y

wnWn-1

w1

Xn-1

x1

Threshold = θ

Page 8: CS623: Introduction to Computing with Neural Nets (lecture-2)

θ

1y

Step function / Threshold functiony = 1 for Σwixi >=θ =0 otherwise

Σwixi

Page 9: CS623: Introduction to Computing with Neural Nets (lecture-2)

Perceptron Training Algorithm (PTA)

Preprocessing:

1. The computation law is modified to

y = 1 if ∑wixi > θ

y = o if ∑wixi < θ

. . .

θ, ≤

w1 w2 wn

x1 x2 x3 xn

. . .

θ, <

w1 w2 w3wn

x1 x2 x3 xn

w3

Page 10: CS623: Introduction to Computing with Neural Nets (lecture-2)

PTA – preprocessing cont…

2. Absorb θ as a weight

3. Negate all the zero-class examples

. . .

θ

w1 w2 w3 wn

x2 x3 xnx1

w0=θ

x0= -1

. . .

θ

w1 w2 w

3

wn

x2 x3 xnx1

Page 11: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example to demonstrate preprocessing

• OR perceptron1-class <1,1> , <1,0> , <0,1>0-class <0,0>

Augmented x vectors:-1-class <-1,1,1> , <-1,1,0> , <-1,0,1>0-class <-1,0,0>

Negate 0-class:- <1,0,0>

Page 12: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example to demonstrate preprocessing cont..

Now the vectors are

x0 x1 x2

X1 -1 0 1

X2 -1 1 0

X3 -1 1 1

X4 1 0 0

Page 13: CS623: Introduction to Computing with Neural Nets (lecture-2)

Perceptron Training Algorithm

1. Start with a random value of w

ex: <0,0,0…>

1. Test for wxi > 0

If the test succeeds for i=1,2,…n

then return w

3. Modify w, wnext = wprev + xfail

Page 14: CS623: Introduction to Computing with Neural Nets (lecture-2)

Tracing PTA on OR-example

w=<0,0,0> wx1 fails

w=<-1,0,1> wx4 fails

w=<0,0 ,1> wx2 fails

w=<-1,1,1> wx1 fails

w=<0,1,2> wx4 fails

w=<1,1,2> wx2 fails

w=<0,2,2> wx4 fails w=<1,2,2> success

Page 15: CS623: Introduction to Computing with Neural Nets (lecture-2)

Theorems on PTA

1. The process will terminate

2. The order of selection of xi for testing and wnext does not matter.

Page 16: CS623: Introduction to Computing with Neural Nets (lecture-2)

End of recap

Page 17: CS623: Introduction to Computing with Neural Nets (lecture-2)

Proof of Convergence of PTA

• Perceptron Training Algorithm (PTA)

• Statement:

Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Page 18: CS623: Introduction to Computing with Neural Nets (lecture-2)

Proof of Convergence of PTA

• Suppose wn is the weight vector at the nth step of the algorithm.

• At the beginning, the weight vector is w0

• Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as

wi+1 = wi + Xj

• Since Xjs form a linearly separable function, w* s.t. w*Xj > 0 j

Page 19: CS623: Introduction to Computing with Neural Nets (lecture-2)

Proof of Convergence of PTA• Consider the expression

G(wn) = wn . w*

| wn|

where wn = weight at nth iteration

• G(wn) = |wn| . |w*| . cos |wn|

where = angle between wn and w*

• G(wn) = |w*| . cos • G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1)

Page 20: CS623: Introduction to Computing with Neural Nets (lecture-2)

Behavior of Numerator of G

wn . w* = (wn-1 + Xn-1fail ) . w*

wn-1 . w* + Xn-1fail . w*

(wn-2 + Xn-2fail ) . w* + Xn-1

fail . w* …..w0 . w* + ( X0

fail + X1fail +.... + Xn-1

fail ). w*

w*.Xifail is always positive: note carefully

• Suppose |Xj| ≥ , where is the minimum magnitude.

• Num of G ≥ |w0 . w*| + n . |w*| • So, numerator of G grows with n.

Page 21: CS623: Introduction to Computing with Neural Nets (lecture-2)

Behavior of Denominator of G

• |wn| = wn . wn

(wn-1 + Xn-1fail )2

(wn-1)2 + 2. wn-1. Xn-1

fail + (Xn-1fail )2

≤ (wn-1)2 + (Xn-1

fail )2 (as wn-1. Xn-1

fail ≤ 0 )

≤ (w0)2 + (X0

fail )2 + (X1fail )2 +…. + (Xn-1

fail )2

• |Xj| ≤ (max magnitude)

• So, Denom ≤ (w0)2 + n2

Page 22: CS623: Introduction to Computing with Neural Nets (lecture-2)

Some Observations

• Numerator of G grows as n

• Denominator of G grows as n

=> Numerator grows faster than denominator

• If PTA does not terminate, G(wn) values will become unbounded.

Page 23: CS623: Introduction to Computing with Neural Nets (lecture-2)

Some Observations contd.

• But, as |G(wn)| ≤ |w*| which is finite, this is impossible!

• Hence, PTA has to converge.

• Proof is due to Marvin Minsky.

Page 24: CS623: Introduction to Computing with Neural Nets (lecture-2)

Convergence of PTA

•Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Page 25: CS623: Introduction to Computing with Neural Nets (lecture-2)

Study of Linear Separability

y

x1

. . .

θ

w1 w2 w3 wn

x2 x3 xn

• W. Xj = 0 defines a hyperplane in the (n+1) dimension.=> W vector and Xj

vectors are perpendicular to each other.

Page 26: CS623: Introduction to Computing with Neural Nets (lecture-2)

Linear Separability

Xk+1 -

Xk+2 -

-

Xm -

X1+ + X2

Xk+

+

Positive set :

w. Xj > 0 j≤k

Negative set :

w. Xj < 0 j>kSeparating hyperplane

Page 27: CS623: Introduction to Computing with Neural Nets (lecture-2)

Test for Linear Separability (LS)• Theorem:

A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC)

• PLC – Both a necessary and sufficient condition.

• X1, X2, … , Xm - Vectors of the function• Y1, Y2, … , Ym - Augmented negated set

• Prepending -1 to the 0-class vector Xi and negating it, gives Yi

Page 28: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example (1) - XNOR

• The set {Yi} has a PLC if ∑ Pi Yi = 0 ,

1 ≤ i ≤ m– where each Pi is a

non-negative scalar and

– atleast one Pi > 0

• Example : 2 bit even-parity (X-NOR function)

<-1,1,1>Y4<1,1> +

X4

<1,-1,0>Y3<1,0> -

X3

<1,0,-1>Y2<0,1> -

X2

<-1,0,0>Y1<0,0> +

X1

Page 29: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example (1) - XNOR

• P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T

+ P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T

= [ 0 0 0 ]T

• All Pi = 1 gives the result.

• For Parity function,PLC exists => Not linearly separable.

Page 30: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example (2) – Majority function

<-1 1 1 1><1 1 1> +

<-1 1 1 0><1 1 0> +

<-1 1 0 1><1 0 1> +

<1 -1 0 0><1 0 0> -

<-1 0 1 1><0 1 1> +

<1 0 -1 0><0 1 0> -

<1 0 0 -1><0 0 1> -

<1 0 0 0><0 0 0> -YiXi

• 3-bit majority function• Suppose PLC exists.

Equations obtained are:• P1 + P2+ P3- P4 + P5- P6- P7 –

P8 = 0

• -P5+ P6 + P7+ P8 = 0

• -P3+ P4 + P7+ P8 = 0

• -P2+ P4 + P6+ P8 = 0

• On solving, all Pi will be forced to 0

• 3 bit majority function=> No PLC => LS

Page 31: CS623: Introduction to Computing with Neural Nets (lecture-2)

Limitations of perceptron

• Non-linear separability is all pervading

• Single perceptron does not have enough computing power

• Eg: XOR cannot be computed by perceptron

Page 32: CS623: Introduction to Computing with Neural Nets (lecture-2)

Solutions• Tolerate error (Ex: pocket algorithm used

by connectionist expert systems).– Try to get the best possible hyperplane using

only perceptrons

• Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola

• Use layered network

Page 33: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example - XOR

011

001

110

000

x1x

2

x2x1

w2=1.5w1=-1θ = 1

x1 x2

21

1

2

0

ww

w

w

Calculation of XOR

Calculation of x1x2

w2=1w1=1θ = 0.5

x1x2 x1x2

Page 34: CS623: Introduction to Computing with Neural Nets (lecture-2)

Example - XOR

w2=1w1=1θ = 0.5

x1x2 x1x2

-1

x1 x2

-11.5

1.5

1 1

Page 35: CS623: Introduction to Computing with Neural Nets (lecture-2)

Multi Layer Perceptron (MLP)

• Question:- How to find weights for the hidden layers when no target output is available?

• Credit assignment problem – to be solved by “Gradient Descent”

Page 36: CS623: Introduction to Computing with Neural Nets (lecture-2)

Assignment

• Find by solving linear inequalities the perceptron to solve the majority Boolean Function.

• Then feed it to your implementation of the perceptron training algorithm and study its behaviour.

• Download SNNS and WEKA packages and try your hand at the perceptron algorithm built in the packages.