adaptive neuro-fuzzy inference systems - … · adaptive neuro-fuzzy inference systems rbfn and ts...
TRANSCRIPT
1
ADAPTIVE NEURO-FUZZYINFERENCE SYSTEMS
RBFN and TS systems
Equivalent if the following hold:Both RBFN and TS use same aggregation method foroutput (weighted sum or weighted average)Number of basis functions in RBFN equals number ofrules in TSTS uses Gaussian membership functions with sameas basis functions and rule firing is determined bymultiplicationRBFN response function (ci) and TS rule consequentsare equal
351
2
ANFIS
Adaptive Neuro-Fuzzy Inference Systems (ANFIS)Takagi-Sugeno fuzzy system mapped onto a neuralnetwork structure.Different representations are possible, but one with 5layers is the most common.Network nodes in different layers have differentstructures.
352
ANFIS
Consider a first-order Sugeno fuzzy model, with two inputs, xand y, and one output, z.Rule set
Rule 1: If x is A1 and y is B1, then f1 = p1x + q1y + r1
Rule 2: If x is A2 and y is B2, then f2 = p2x + q2y + r2
353
A1
A2 B2
B1
X Y
X Yx y
w1
w2
f1 = p1x + q1y + r1
f2 = p2x + q2y + r2
1 1 2 2
1 2
1 1 2 2
w f w ffw w
w f w f
Weightedfuzzy-mean:
3
ANFIS architecture
Corresponding equivalent ANFIS architecture:
354
ANFIS layers
Layer 1: every node is an adaptive node with nodefunction:
Parameters in this layer are called premise parameters.Layer 2: every node is fixed whose output (representingfiring strength) is the product of the inputs:
Layer 3: every node is fixed (normalization):
355
1, ( )i i iO x
2,i i jjO w
3,i
i ijj
wO ww
4
ANFIS layers
Layer 4: every node is adaptive (consequentparameters) :
Layer 5: single node, sums up inputs:
Adaptive network is functionally equivalent to aSugeno fuzzy model!
356
4, 3, 0 1 1( ... )i i i i n nO O f w p p x p x
5,i ii
ii ii ii
w fO w f
w
ANFIS with multiple rules
357
5
Hybrid learning for ANFIS
Consider the two rules ANFIS with two inputs x and yand one output z;Let the premise parameters be fixed;ANFIS output is given by linear combination ofconsequent parameters p, q and r:
358
1 21 2
1 2 1 2
1 21 1 1 2 2 2
1 1 1 2 2 21 1 1 2 2 2
( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
w wz f fw w w w
w p x q y r w p x q y r
w x p w y q w r w x p w y q w rA
Hybrid learning for ANFIS
Partition total parameters set S as:S1: set of premise (nonlinear) parametersS2: set of consequent (linear) parameters
q: unknown vector which elements are parameters in S2
z = Aq: standard linear least-squares problemBest solution for q that minimizes ||Aq – z ||2 is theleast-squares estimator q*:
q* = (ATA)-1ATz
359
6
Hybrid learning for ANFIS
What if premise parameters are not optimal?Combine steepest descent and least-squares estimatorto update parameters in adaptive network.Each epoch is composed of:
1. Forward pass: node outputs go forward until Layer 4and consequent parameters are identified by least-squares estimator;
2. Backward pass: error signals propagate backward andthe premise parameters are updated by gradientdescent.
360
Hybrid learning for ANFIS
Error signals: derivative of error measure with respectto each node output.
Hybrid approach converges much faster by reducingthe search space of pure backpropagation method.
361
Forward pass Backward passPremise parameters Fixed Gradient descentConsequentparameters
Least-squaresestimator Fixed
Signals Node outputs Error signals
7
Stone-Weierstrass theorem
Let D be a compact space of N dimensions and let be aset of continuous real-valued functions on D satisfying:1. Identity function: the constant f (x) = 1 is in .2. Separability: for any two points x1 x2 in D, there is
an f in such that f (x1) f (x2 ).3. Algebraic closure: if f and g are two functions in ,
then fg and af+bg are also in for any reals a and b.Then, is dense in the closure C(D) of D, i.e.:
" e > 0, " g C(D), $ f : |g(x) – f(x)| < e, " x D.
362
Universal approximator ANFIS
According to Stone-Weierstrass theorem, an ANFIS hasunlimited approximation power for matching anycontinuous nonlinear function arbitrarily wellIdentity: obtained by having a constant consequentSeparability: obtained by selecting differentparameters in the network
363
8
Algebraic closure
Consider two systems with two rules and final outputs:
Additive:
Construct 4 rule inference system that computes:
364
1 21 1 2 2 1 2
1 21 2
andw f w f w f w fz zw w w w
1 21 1 2 2 1 2
1 21 2
1 2 1 21 1 1 1 2 2 2 21 2 1 2
1 2 1 21 1 2 2
( ) ( ) ( ) ( )
w f w fw f w faz bz a bw w w w
w w af b f w w af b f w w af b f w w af b fw w w w w w w w
az bz
Algebraic closure
Multiplicative:
Construct 4 rule inference system that computes:
365
1 21 1 2 2 1 2
1 21 2
1 2 1 21 1 1 1 2 2 2 21 2 1 2
1 2 1 21 1 2 2
w f w fw f w fzzw w w w
w w f f w w f f w w f f w w f fw w w w w w w w
z z
9
Model building guidelines
Select number of fuzzy sets per variable:empirically by examining data or trial and errorusing clustering techniquesusing regression trees (CART)
Initially, distribute bell-shaped membership functionsevenly:
Using an adaptive step size can speed up training.
366
How to design ANFIS?
InitializationDefine number and type of inputsDefine number and type of outputsDefine number of rules and type of consequentsDefine objective function and stop conditions
Collect dataNormalize inputsDetermine initial rulesInitialize network
TRAIN
367
10
Ex. 1: Two-input sinc function
Input range: [-10,10] [-10,10], 121 training data pairs.Multi-Layer Perceptron vs. ANFIS:
MLP: 18 neurons in hidden layer, 73 parameters,quick propagation (best learning algorithm forbackpropagation MLP).ANFIS: 16 rules, 4 membership functions pervariable, 72 fitting parameters (48 linear, 24nonlinear), hybrid learning rule.
368
sin( )sin( )sin c( , ) x yz x yxy
MLP vs. ANFIS results
369
Average of 10 runs:MLP: different setsof initial randomweights;ANFIS: 10 stepsizes between 0.01and 0.10.
MLP’s approximation power decrease due to: learningprocesses trapped in local minima or some neurons can bepushed into saturation during training.
11
ANFIS output
370
-100
10
-100
10
0
0.5
1
X
Training data
Y -100
10
-100
10
0
0.5
1
X
ANFIS Output
Y
0 50 100 150 200 2500
0.05
0.1
0.15
0.2
epoch number
root
mea
n sq
uare
d er
ror
error curve
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
epoch number
step
siz
e
step size curve
ANFIS model
371
-10 -5 0 5 10
0
0.5
1
input1
Deg
ree
of m
embe
rshi
p
Initial MFs on X
-10 -5 0 5 10
0
0.5
1
input2
Deg
ree
of m
embe
rshi
p
Initial MFs on Y
-10 -5 0 5 10
0
0.5
1
input1
Deg
ree
of m
embe
rshi
p
Final MFs on X
-10 -5 0 5 10
0
0.5
1
input2
Deg
ree
of m
embe
rshi
p
Final MFs on Y
12
Ex. 2: 3-input nonlinear function
Two membership functions per variable, 8 rulesInput ranges: [1,6] [1,6] [1,6]216 training data, 125 validation data
372
20.5 1 1.5output 1 x y z
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
epoch number
root
mea
n sq
uare
d er
ror
error curves
training errorchecking error
0 20 40 60 80 1000
0.05
0.1
0.15
0.2
epoch number
step
siz
e
step size curve
ANFIS model
373
1 2 3 4 5 6
0
0.5
1
input1
Deg
ree
of m
embe
rshi
p
Initial MFs on X, Y and Z
1 2 3 4 5 6
0
0.5
1
input1
Deg
ree
of m
embe
rshi
p
Final MFs on X
1 2 3 4 5 6
0
0.5
1
input2
Deg
ree
of m
embe
rshi
p
Final MFs on Y
1 2 3 4 5 6
0
0.5
1
input3
Deg
ree
of m
embe
rshi
p
Final MFs on Z
13
Results comparison
374
[1] T. Kondo. Revised GMDH algorithm estimating degree of the complete polynomial. Trans. of the Society ofInstrument and Control Engineers, 22(9):928:934, 1986.
[2] M. Sugeno and G. T. Kang, Structure Identification of fuzzy model. Fuzzy Sets and Systems, 28:15-33, 1988.
1
( ) ( )1APE Average Percentage Error .100%( )
P
i
T i O iP T i
Model Trainingerror
Checkingerror # Param. Training
data sizeCheckingdata size
ANFIS 0.043% 1.066% 50 216 125GMDH
model [1] 4.7% 5.7% - 20 20
Fuzzy model1 [2] 1.5% 2.1% 22 20 20
Fuzzy model2 [2] 0.59% 3.4% 32 20 20
Ex. 3: Modeling dynamic system
Plant equation
f(.) has the following form
Estimate nonlinear function F with ANFIS
Plant input:ANFIS parameters updated at each step (on-line)Learning rate: = 0.1; forgetting factor: = 0.99ANFIS can adapt even after the input changesQuestion: was the input signal rich enough?
João M. C. Sousa 375
( 1) 0.3 ( ) 0.6 ( 1) ( ( ))y k y k y k f u k
( ) 0.6sin( ) 0.3sin(3 ) 0.1sin(5 )f u u u u
ˆ ˆ ˆ( 1) 0.3 ( ) 0.6 ( 1) ( ( ))y k y k y k F u k( ) sin(2 / 250)u k k
14
Plant and model outputs
376
Effect of number of MFs
377
5 m
embe
rshi
p fu
nctio
ns
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
Initial MFs
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
Final MFs
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1f(u) and ANFIS Outputs
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1Each Rule's Outputs
15
4m
embe
rshi
p fu
nctio
ns
Effect of number of MFs
378
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
Initial MFs
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
Final MFs
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1f(u) and ANFIS Outputs
-1 -0.5 0 0.5 1
-1
0
1
Each Rule's Outputs
3 m
embe
rshi
p fu
nctio
ns
Effect of number of MFs
379
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
Initial MFs
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
Final MFs
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1f(u) and ANFIS Outputs
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Each Rule's Outputs
16
Ex. 4: Chaotic time series
Consider a chaotic time series generated by
Task: predict system output at some future instance t+P byusing past outputs500 training data, 500 validation dataANFIS input: [x(t 18), x(t 12), x(t 6), x(t)]ANFIS output: x(t + 6)Two MFs per variable, 16 rules104 parameters (24 premise, 80 consequent)Data generated from t =118 to t =1117
380
10
0.2 ( )( ) 0.1 ( )1 ( )
x tx t x tx t
ANFIS model
381
0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
Final MFs on Input 1, x(t - 18)
0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
Final MFs on Input 2, x(t - 12)
0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
Final MFs on Input 3, x(t - 6)
0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
Final MFs on Input 4, x(t)
17
Model output
382
0 5 102
2.2
2.4
2.6
2.8
x 10 -3 Error Curves
Training ErrorChecking Error
0 5 100.1
0.105
0.11
0.115Step Sizes
200 400 600 800 1000
0.6
0.8
1
1.2
Desired and ANFIS Outputs
200 400 600 800 1000-10
-5
0
5
x 10 -3 Prediction Errors
103rd order AR model
383
18
Order selection
Select optimal order of AR model in order to preventoverfittingSelect the order that minimizes the error on a test set
384
( ) ( 1) (1)( ) ( ) ( ) ( ) ( )n ny t y t y t y t u t
44th order AR model
385
19
ANFIS output for P = 84
386
ANFIS extensions
Different types of membership functions in layer 1Parameterized t-norms in layer 2Interpretability
constrained gradient descent optimizationbounds on fuzziness
parameterize to reflect constraintsStructure identification
387
1' ln( )
PN
i ii
E E w w