gradient descent rule tuning see pp. 207-210 in text book

28
Gradient Descent Rule Tuning See pp. 207-210 in text book

Upload: leonard-griffin

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Gradient Descent Rule Tuning

See pp. 207-210 in text book

RulesConsider a rule base with• M rules, rth rule has the form

• IF x1 is Tr,1 AND … AND xn is Tr,n THEN y is yr (or y is yr + other stuff)

• TSK fuzzy system has mathematical form

, , , ,1

, , ,11

,

1

( ; , ,

( )

)

( ; , , )

M

r

M

r

n

r i i r i r i r ii

n

r i i r i r i r i

r

i

f x

y x c L R

x c L R

,1 , , , , ,1

, , , ,

1

1

1

1

, ( ; , , )

( ; , , )

( )

n

r i r i r i r i r ii

n

r i i r i r i r

M

r r r n

ri

n

i

r

M

y x c L R

x

x

L

f x

x

c R

• Membership function parameters– Center, right-width, left-width– Consequent parameters

• 3 level (layer) structure of f(x)– Level (layer) 1:

• For each rule Compute all membership values for each term, compute product, store as zr

– Level (layer) 2:• Compute product of membership values and

consequents, sum: n• Sum membership values: d

– Level (layer) 3:• Compute quotient: f = n/d

( ; ; , )( )

( ; )

n xf x

d x

y

Rule parameters• Membership function parameters

– Center, right-width, left-width– Consequent parameters

• Why not s, z and triangular membership functions?• Why Gaussian membership functions?

,

2

,

,

,2

1 1

1

1

1 1

( , , )( )

( , , )

i r

i r

i r

i r

xnM

M

r i lrM

xnM

lr

x

r i

x

r

r

ez x

f xz x

xy

e

y

x

Gradient Descent

• Choose parameters to minimize the error

• Corresponds to a blind person descending a mountain by finding the steepest descending slope and moving in that direction

• Slope is determined by differentiation (computing the “gradient”)

• Chain rule helps tremendously.

Gradient Descent Math•Consider a sequence of input/output measurements: (x0

p, y0p)

•As each input/output measurement pair arrives (and before the next input/output measurement pair arrives), we want to adjust our model parameters to reduce the error ep = [f(x0

p)-y0p]2/2

•Dropping the sub-and-super-scripts e = [f(x)-y]2/2•The gradient descent algorithm for any vector-valued parameter s is

old

new old ss s

s

es s

s

step size for s

2,

,

1 1

1 1

( , , )

( , )

( , , ) ( , , )

( , , ) ( ; , )i r i

r i

M M

r r rr r

xn n

r ii i

r

x

nf

d

n z x d z x

z x e x

y x

x

x i r

y x x

old

new old ss s

s

es s

s

step size for s

Apply to: y

x

For ybar

old

new old y

y

ey y

y

2

2 1

1

( ; )1 1

( )2 2 ( ; )

M

rrM

r

y x ie f x y y

x i

2

1

1

1 1

( )1( )

( ( ) ) 2

( )( )

( , )( )

( , )

( ; )( , )

( ; ) ( , )

q q

q

M

r

M

rrM M

r r

f x yef x y

y f x y y

f xf x y

y

x qf x y

x r

y x rx q

yx r x r

1

2

1

( ,1)( ; )

( , 2)

( ; )( , )

M

rr

M

r

xy y x r

xe

yx r

x M

Given x and y

Modify for betaModify for xbarModify for sigma

Gradient Descent• For a generic

parameter

• For ybar, see previous slide

• For xbar

• For sigma• Abstraction saves

work.

de dff y

dp dp

?i

df

dx

?i

df

d

?i

df

dy

One LV Example FL System

• LV X: Term set: Negative, Zero, Positive

• 3 rules

• Antecedent matrix, Consequent matrix

• Gaussian membership functions

• Super membership function

• Fuzzy function parameters

• TSK fuzzy function

• Gradient Descent parameter tuning

One LV Example FL System• LV X: Negative5, Zero, Positive5• 3 rules

– If x is Negative5 then y is 25– If x is Zero then y is 0– If x is Positive5 then y is 25

• Antecedent matrix and consequent matrix

5 1

2

5 3

Negative

A Zero

Positive

1

2

3

25 ?

0 ??

25 ???

y

C y

y

One LV Example FL System

• LV X: Negative5, Zero, Positive5

• Gaussian membership functions

222

2

(0)

2( )x xx

Zero x e e

22

3

3

(5)

25 ( )

x xx

Positive x e e

221

1

( 5)

25 ( )

x xx

Negative x e e

1

2

3

5 ?

0 ??

5 ???

x

x x

x

1

2

3

2 ?

2 ??

2 ???

One LV Example FL System

• Super membership function

5 1 1 1 1 1

2 2 2 2 2

5 3 3 3 3 3

( , , ) ( , , )

( , , , , ) ( , , ) ( , , )

( , , ) ( , , )

{ 5, , 5} {1,2,3}

Negative

Zero

Positive

x x x x

x LV T x x x x x

x x x x

LV is X

T is one of Negative Zero Positive

One LV Example FL System• TSK fuzzy function• Gradient Descent parameter tuning

3 3

1 13 3

1 1

( , , ) ( , , )( )

( , , ) ( , , )

r r r r r rr r

r r r rr r

y x x y z x xf x

x x z x x

One LV Example FL System

• TSK fuzzy function, Gradient Descent parameter tuning ybar

3 3

1 13 3

1 1

( , , ) ( , , )( )

( , , ) ( , , )

r r r r r rr r

r r r rr r

y x x y z x xf x

x x z x x

21( , , , )

2e f x y x y

o

n o y

y

de

dy

yy

[ ( , )[ ( , ) ]

[ ( , ) ]

] ( , )d f x y ydef x y

dey

d f x

df x y

dy dydy yy

: ( , )new data x y

3

1

( , ) ( , , )

( , , )

i i i i

ir r r

r

df x y x x

dy x x

1 1

2 23 3

3 31 1

( , )

1 1

r rr r

df x y

dy

z

zz z

One LV Example FL System

• TSK fuzzy function, Gradient Descent parameter tuning ybar

o

n o y

y

de

dyy y

( , )[ ( , ) ]

df x yf x y

y

d

y ddy

e

: ( , )new data x y

1

, ,

23

31

( , , )( , , , )

( , , )( , , ) ( , , )

o o o

n o y

x

r

yr

z xf y x

z xz x z

y y

xx y

x

xx x

Heart and soul of gradient descent algorithm to tune ybar using experimental data.

Engineers derive these expressions. Computers compute with these expressions, often iteratively, to improve designs.

Note interplay of theory and real-world data.

One LV Example FL System

• TSK fuzzy function, Gradient Descent parameter tuning xbar

o

n o xx

de

dxx x

[ ( , ) ]

( , )f x x y

de df x

dxd

x

x

: ( , )new data x y

3

13

1

3 3

1 13 3

1 1

( , ) ( , )

( , ) (

)

,

( ,

)

r r r r rr r

i i ir r

r rr

ir

rr

r r

i i

i

y z x xd

y x xdf d d

dx d

x x

dxx dxz x x x x

yd

d

2

3 3 3

1 1 12 23 3

1 1

2 2

2(1 (

) 2( ))i

i

i

i r r r i r rr r r

r r

x x

i

r r

ii

i i

y y y ydf

dx

x x x xe

One LV Example FL System

• TSK fuzzy function, Gradient Descent parameter tuning xbar

112

1

2

3

11

3

2231

222

332

3

33

1

1

( )

1( )

2( )

2( )

)( )

2(

r rr

r rr

rr

r rr

y y

y y

y

x x

x x

x

f

d

yx

d

x

o

n o xx

de

dxx x

( , )[ ( , ) ]

de df xf x

dx x

x

dx y

: ( , )new data x y

31

1 121 1

32

2 22 231 2

31 3

3 321 3 , ,

2( )( )

2( )1( , , , ) ( )

2( )( )

o o o

r rr

r rr

rr

r r

n o x

r y x

x

xx y

xy y

xf y x y y

y

x

xy

x

x

One LV Example FL System

• TSK fuzzy function, Gradient Descent parameter tuning xbar

o

n ox

de

d

[ ( , ) ]

( , )f x x y

de df

dd

x

: ( , )new data x y

3

13

1

3 3

1 13 3

1 1

( , ) ( , )

( , ) (

)

,

( ,

)

r r r r rr r

i i ir r

r r

rr r

r

ir

i i

r

i

y z x y xdf d d

d

d x

dd dz x x

yd

d

2

3 3 3

1 1 12 23 3

1

3

1

2 2

3

2( )( )

2( )1 i

i

i

i r r r i r rx x

i ii

i i

r r r

r rr r

x x xdf

d

y ye

y yx

One LV Example FL System

• TSK fuzzy function, Gradient Descent parameter tuning xbar

21

131

2

3

11

3

2231

3

223

2

23

333

13

1

( )

1( )

2( )

2( )

)( )

2(

r rr

r rr

rr

r rr

x x

x x

y y

y ydf

d

yx x

y

o

n o x

de

d

( , )[ ( , ) ]

de dff yx

d d

xx

: ( , )new data x y

231

1 131 1

232

2 22 331 2

231 3

3 331 3 , ,

2( )( )

2( )1( , , , ) ( )

2( )( )

o o o

r rr

r rn or

rr

r

x

rr y x

xy y

xf y x y y

xy

xy

x

xx y

One LV: Gradient Descent Summary

31

1 121 1

32

2 22 231 2

31 3

3 321 3 , ,

2( )( )

2( )1( , , , ) ( )

2( )( )

o o o

r rr

r rr

rr

r r

n o x

r y x

x

xx y

xy y

xf y x y y

y

x

xy

x

x

231

1 131 1

232

2 22 331 2

231 3

3 331 3 , ,

2( )( )

2( )1( , , , ) ( )

2( )( )

o o o

r rr

r rn or

rr

r

x

rr y x

xy y

xf y x y y

xy

xy

x

xx y

1

, ,

23

31

( , , )( , , , )

( , , )( , , ) ( , , )

o o o

n o y

x

r

yr

z xf y x

z xz x z

y y

xx y

x

xx x

• We are now ready to do gradient descent

Two LV Example FL System• Temperature term set: Cold, Comfortable, Hot• Humidity term set: Wet, Dry• 6 rules• Antecedent matrix, Consequent matrix• Gaussian membership functions• Super membership function• Fuzzy function parameters• TSK Fuzzy Function• Gradient descent parameter tuning

Two LV Example FL System• Temperature term set: Comfortable, Warm, Hot

• Humidity term set: Wet, Dry

• 6 rules– If T is Comfortable and H is Wet then HI is– If T is Comfortable and H is Dry then HI is– If T is Warm and H is Wet then HI is– If T is Warm and H is Dry then HI is– If T is Hot and H is Wet then HI is – If T is Hot and H is Dry then HI is

Two LV Example FL System: Matrices– If T is Comfortable and H is Wet then HI is– If T is Comfortable and H is Dry then HI is– If T is Warm and H is Wet then HI is– If T is Warm and H is Dry then HI is– If T is Hot and H is Wet then HI is – If T is Hot and H is Dry then HI is

1

2

3

4

5

6

1 1

1 2

2 1

2 2

3 1

3 2

yComfortable Wet

yComfortable Dry

yWarm WetA C

yWarm Dry

yHot Wet

yHot Dry

Two LV Example FL System

• Temperature term set: Cold, Comfortable, Hot

• Humidity term set: Wet, Dry

• Gaussian membership functions

• Super membership function

Two LV Example FL System

• TSK Fuzzy Function

• Gradient descent parameter tuning

Two LV Example FL System

• Gradient descent parameter tuning