adaptive neuro-fuzzy inference systems - … · adaptive neuro-fuzzy inference systems rbfn and ts...

1

ADAPTIVE NEURO-FUZZYINFERENCE SYSTEMS

RBFN and TS systems

Equivalent if the following hold:Both RBFN and TS use same aggregation method foroutput (weighted sum or weighted average)Number of basis functions in RBFN equals number ofrules in TSTS uses Gaussian membership functions with sameas basis functions and rule firing is determined bymultiplicationRBFN response function (ci) and TS rule consequentsare equal

351

2

ANFIS

Adaptive Neuro-Fuzzy Inference Systems (ANFIS)Takagi-Sugeno fuzzy system mapped onto a neuralnetwork structure.Different representations are possible, but one with 5layers is the most common.Network nodes in different layers have differentstructures.

352

ANFIS

Consider a first-order Sugeno fuzzy model, with two inputs, xand y, and one output, z.Rule set

Rule 1: If x is A1 and y is B1, then f1 = p1x + q1y + r1

Rule 2: If x is A2 and y is B2, then f2 = p2x + q2y + r2

353

A1

A2 B2

B1

X Y

X Yx y

w1

w2

f1 = p1x + q1y + r1

f2 = p2x + q2y + r2

1 1 2 2

1 2

1 1 2 2

w f w ffw w

w f w f

Weightedfuzzy-mean:

3

ANFIS architecture

Corresponding equivalent ANFIS architecture:

354

ANFIS layers

Layer 1: every node is an adaptive node with nodefunction:

Parameters in this layer are called premise parameters.Layer 2: every node is fixed whose output (representingfiring strength) is the product of the inputs:

Layer 3: every node is fixed (normalization):

355

1, ( )i i iO x

2,i i jjO w

3,i

i ijj

wO ww

4

ANFIS layers

Layer 4: every node is adaptive (consequentparameters) :

Layer 5: single node, sums up inputs:

Adaptive network is functionally equivalent to aSugeno fuzzy model!

356

4, 3, 0 1 1( ... )i i i i n nO O f w p p x p x

5,i ii

ii ii ii

w fO w f

w

ANFIS with multiple rules

357

5

Hybrid learning for ANFIS

Consider the two rules ANFIS with two inputs x and yand one output z;Let the premise parameters be fixed;ANFIS output is given by linear combination ofconsequent parameters p, q and r:

358

1 21 2

1 2 1 2

1 21 1 1 2 2 2

1 1 1 2 2 21 1 1 2 2 2

( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

w wz f fw w w w

w p x q y r w p x q y r

w x p w y q w r w x p w y q w rA


Partition total parameters set S as:S1: set of premise (nonlinear) parametersS2: set of consequent (linear) parameters

q: unknown vector which elements are parameters in S2

z = Aq: standard linear least-squares problemBest solution for q that minimizes ||Aq – z ||2 is theleast-squares estimator q*:

q* = (ATA)-1ATz

359

6


What if premise parameters are not optimal?Combine steepest descent and least-squares estimatorto update parameters in adaptive network.Each epoch is composed of:

1. Forward pass: node outputs go forward until Layer 4and consequent parameters are identified by least-squares estimator;

2. Backward pass: error signals propagate backward andthe premise parameters are updated by gradientdescent.

360


Error signals: derivative of error measure with respectto each node output.

Hybrid approach converges much faster by reducingthe search space of pure backpropagation method.

361

Forward pass Backward passPremise parameters Fixed Gradient descentConsequentparameters

Least-squaresestimator Fixed

Signals Node outputs Error signals

7

Stone-Weierstrass theorem

Let D be a compact space of N dimensions and let be aset of continuous real-valued functions on D satisfying:1. Identity function: the constant f (x) = 1 is in .2. Separability: for any two points x1 x2 in D, there is

an f in such that f (x1) f (x2 ).3. Algebraic closure: if f and g are two functions in ,

then fg and af+bg are also in for any reals a and b.Then, is dense in the closure C(D) of D, i.e.:

" e > 0, " g C(D), $ f : |g(x) – f(x)| < e, " x D.

362

Universal approximator ANFIS

According to Stone-Weierstrass theorem, an ANFIS hasunlimited approximation power for matching anycontinuous nonlinear function arbitrarily wellIdentity: obtained by having a constant consequentSeparability: obtained by selecting differentparameters in the network

363

8

Algebraic closure

Consider two systems with two rules and final outputs:

Additive:

Construct 4 rule inference system that computes:

364

1 21 1 2 2 1 2

1 21 2

andw f w f w f w fz zw w w w

1 21 1 2 2 1 2

1 21 2

1 2 1 21 1 1 1 2 2 2 21 2 1 2

1 2 1 21 1 2 2

( ) ( ) ( ) ( )

w f w fw f w faz bz a bw w w w

w w af b f w w af b f w w af b f w w af b fw w w w w w w w

az bz

Algebraic closure

Multiplicative:

Construct 4 rule inference system that computes:

365

1 21 1 2 2 1 2

1 21 2

1 2 1 21 1 1 1 2 2 2 21 2 1 2

1 2 1 21 1 2 2

w f w fw f w fzzw w w w

w w f f w w f f w w f f w w f fw w w w w w w w

z z

9

Model building guidelines

Select number of fuzzy sets per variable:empirically by examining data or trial and errorusing clustering techniquesusing regression trees (CART)

Initially, distribute bell-shaped membership functionsevenly:

Using an adaptive step size can speed up training.

366

How to design ANFIS?

InitializationDefine number and type of inputsDefine number and type of outputsDefine number of rules and type of consequentsDefine objective function and stop conditions

Collect dataNormalize inputsDetermine initial rulesInitialize network

TRAIN

367

10

Ex. 1: Two-input sinc function

Input range: [-10,10] [-10,10], 121 training data pairs.Multi-Layer Perceptron vs. ANFIS:

MLP: 18 neurons in hidden layer, 73 parameters,quick propagation (best learning algorithm forbackpropagation MLP).ANFIS: 16 rules, 4 membership functions pervariable, 72 fitting parameters (48 linear, 24nonlinear), hybrid learning rule.

368

sin( )sin( )sin c( , ) x yz x yxy

MLP vs. ANFIS results

369

Average of 10 runs:MLP: different setsof initial randomweights;ANFIS: 10 stepsizes between 0.01and 0.10.

MLP’s approximation power decrease due to: learningprocesses trapped in local minima or some neurons can bepushed into saturation during training.

11

ANFIS output

370

-100

10

-100

10

0

0.5

1

X

Training data

Y -100

10

-100

10

0

0.5

1

X

ANFIS Output

Y

0 50 100 150 200 2500

0.05

0.1

0.15

0.2

epoch number

root

mea

n sq

uare

d er

ror

error curve

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

epoch number

step

siz

e

step size curve

ANFIS model

371

-10 -5 0 5 10

0

0.5

1

input1

Deg

ree

of m

embe

rshi

p

Initial MFs on X

-10 -5 0 5 10

0

0.5

1

input2

Deg

ree

of m

embe

rshi

p

Initial MFs on Y

-10 -5 0 5 10

0

0.5

1

input1

Deg

ree

of m

embe

rshi

p

Final MFs on X

-10 -5 0 5 10

0

0.5

1

input2

Deg

ree

of m

embe

rshi

p

Final MFs on Y

12

Ex. 2: 3-input nonlinear function

Two membership functions per variable, 8 rulesInput ranges: [1,6] [1,6] [1,6]216 training data, 125 validation data

372

20.5 1 1.5output 1 x y z

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

epoch number

root

mea

n sq

uare

d er

ror

error curves

training errorchecking error

0 20 40 60 80 1000

0.05

0.1

0.15

0.2

epoch number

step

siz

e

step size curve

ANFIS model

373

1 2 3 4 5 6

0

0.5

1

input1

Deg

ree

of m

embe

rshi

p

Initial MFs on X, Y and Z

1 2 3 4 5 6

0

0.5

1

input1

Deg

ree

of m

embe

rshi

p

Final MFs on X

1 2 3 4 5 6

0

0.5

1

input2

Deg

ree

of m

embe

rshi

p

Final MFs on Y

1 2 3 4 5 6

0

0.5

1

input3

Deg

ree

of m

embe

rshi

p

Final MFs on Z

13

Results comparison

374

[1] T. Kondo. Revised GMDH algorithm estimating degree of the complete polynomial. Trans. of the Society ofInstrument and Control Engineers, 22(9):928:934, 1986.

[2] M. Sugeno and G. T. Kang, Structure Identification of fuzzy model. Fuzzy Sets and Systems, 28:15-33, 1988.

1

( ) ( )1APE Average Percentage Error .100%( )

P

i

T i O iP T i

Model Trainingerror

Checkingerror # Param. Training

data sizeCheckingdata size

ANFIS 0.043% 1.066% 50 216 125GMDH

model [1] 4.7% 5.7% - 20 20

Fuzzy model1 [2] 1.5% 2.1% 22 20 20

Fuzzy model2 [2] 0.59% 3.4% 32 20 20

Ex. 3: Modeling dynamic system

Plant equation

f(.) has the following form

Estimate nonlinear function F with ANFIS

Plant input:ANFIS parameters updated at each step (on-line)Learning rate: = 0.1; forgetting factor: = 0.99ANFIS can adapt even after the input changesQuestion: was the input signal rich enough?

João M. C. Sousa 375

( 1) 0.3 ( ) 0.6 ( 1) ( ( ))y k y k y k f u k

( ) 0.6sin( ) 0.3sin(3 ) 0.1sin(5 )f u u u u

ˆ ˆ ˆ( 1) 0.3 ( ) 0.6 ( 1) ( ( ))y k y k y k F u k( ) sin(2 / 250)u k k

14

Plant and model outputs

376

Effect of number of MFs

377

5 m

embe

rshi

p fu

nctio

ns

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Initial MFs

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Final MFs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1f(u) and ANFIS Outputs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1Each Rule's Outputs

15

4m

embe

rshi

p fu

nctio

ns


378

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Initial MFs

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Final MFs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5


-1 -0.5 0 0.5 1

-1

0

1

Each Rule's Outputs

3 m

embe

rshi

p fu

nctio

ns


379

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Initial MFs

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Final MFs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5


-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Each Rule's Outputs

16

Ex. 4: Chaotic time series

Consider a chaotic time series generated by

Task: predict system output at some future instance t+P byusing past outputs500 training data, 500 validation dataANFIS input: [x(t 18), x(t 12), x(t 6), x(t)]ANFIS output: x(t + 6)Two MFs per variable, 16 rules104 parameters (24 premise, 80 consequent)Data generated from t =118 to t =1117

380

10

0.2 ( )( ) 0.1 ( )1 ( )

x tx t x tx t

ANFIS model

381

0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

Final MFs on Input 1, x(t - 18)

0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1


0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1


0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

Final MFs on Input 4, x(t)

17

Model output

382

0 5 102

2.2

2.4

2.6

2.8

x 10 -3 Error Curves

Training ErrorChecking Error

0 5 100.1

0.105

0.11

0.115Step Sizes

200 400 600 800 1000

0.6

0.8

1

1.2

Desired and ANFIS Outputs

200 400 600 800 1000-10

-5

0

5

x 10 -3 Prediction Errors

103rd order AR model

383

18

Order selection

Select optimal order of AR model in order to preventoverfittingSelect the order that minimizes the error on a test set

384

( ) ( 1) (1)( ) ( ) ( ) ( ) ( )n ny t y t y t y t u t

44th order AR model

385

19

ANFIS output for P = 84

386

ANFIS extensions

Different types of membership functions in layer 1Parameterized t-norms in layer 2Interpretability

constrained gradient descent optimizationbounds on fuzziness

parameterize to reflect constraintsStructure identification

387

1' ln( )

PN

i ii

E E w w

adaptive neuro-fuzzy inference systems - … · adaptive neuro-fuzzy inference systems rbfn and ts...

Documents