supervised learning neural networks multilayer perceptron...

52
Supervised learning neural networks Multilayer perceptron Adaptive-Network-based Fuzzy Inference System (ANFIS) First part based on slides by Walter Kosters

Upload: others

Post on 14-Sep-2020

23 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Supervised learning neural networks

• Multilayer perceptron

• Adaptive-Network-based Fuzzy Inference System (ANFIS)

First part based on slides by Walter Kosters

Page 2: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Neural networksMassively connected computational units inspired by

the working of the human brainProvide a mathematical model for biological neural

networks (brains)Characteristics:

learning from examplesadaptive and fault tolerantrobust for fulfilling complex tasks

Page 3: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Network variations Learning methods

(supervised, unsupervised)Architectures (feedforward, recurrent)Output types (binary, continuous)Node types (uniform, hybrid)Implementations (software, hardware)Connection weights

(adjustable, hard-wired)Inspirations

(biological, psychological)

Page 4: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Artificial NeuronInput:

Network with one unit: perceptron

(Neuron/Unit)

Page 5: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Activation Functions

Stepstept(x)=1 if x >= tstept(x)=0 otherwise

Signsign(x)=1 if x>=0sign(x)=-1 if x<0

Page 6: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Activation Functions

Sigmoid (Logistic)sigmoid(x)=1/(1+e-bx)

Hyperbolic tangent

htan(x)=

Fuzzy membership functions!

tanh(x

2)=

1−e− x

1+e− x

Page 7: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Perceptron: Encoding Simple Boolean FunctionsStep functionInput in [0,1]Output in [0,1]

Page 8: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Perceptron: Linear Separablestep function:

1 if 0 otherwise

Page 9: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Perceptron: LearningUpdate rule:

whereError = target – predicted α = learning rate

predicted target Error

Page 10: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Feed-forward network

x1

x2

4

5

36

7 9

8

Input layer Layer 1 Layer 2 Output layer

x8

x9

fixed nodes

adaptive nodes

If one layer in network: perceptron (no hidden units)Otherwise: multi-layer

Page 11: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Backpropagation

Page 12: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

BackpropagationUpdate rule output layer:

with

Update rule hidden layer: with

α learning rateak activation input nodeini weighted input ith output nodeinj weighted input jth hidden node

Page 13: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

x1

x2

4

5

36

7 9

8 x8

x9

ε1

ε2

4

5

36

7 9

8 ε8

ε9

w83

w97

w52 w75

w31

Page 14: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Backpropagation DerivationGradient descentError is for all output nodes

Page 15: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Backpropagation Derivation

Page 16: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

x1

x2

x3

w1

w2

θ = -w0x3

x4

x5

x1

x2

+1

+1+1

+1+1

-1 0.5

1.5

0.5

Page 17: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

• One hidden layer

• Hidden neurons: tanh or logistic activation

• Output neurons: linear activation

Page 18: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Online vs Offline learning online: update after each example p

offline: update after seeing all examples

Page 19: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Speeding up backpropagation

Momentum term

(Output layer)

(General case)

Regularized initialization

Learning rate rescalinglearning rate of layers decreases towards the output layer

Page 20: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Approximation powerGeneral function approximators

Feed-forward neural network with one hidden layer and sigmoidal activation functions can approximate any continuous function arbitrarily well (Cybenko)

Page 21: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

ANFISTakagi-Sugeno fuzzy system mapped onto a

neural network structureDifferent representations are possible, but one

with 5 layers is the most commonNetwork nodes in different layers have different

structuresA1

A2 B2

B1

X Y

X Yx y

w1

w2

f1 = p1x + q1y + r1

f2 = p2x + q2y + r2 2211

21

2211

fwfw

ww

fwfwf

+=++=

Page 22: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

(product)

(normalization)

Page 23: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

ANFIS layersLayer 1: every node is adaptive with function

Layer 2: every node is fixed and computes a T-norm of the inputs

Layer 3: every node is fixed with function

Layer 4: every node is adaptive with function

Layer 5: single node sums up inputs

Page 24: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron
Page 25: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Hybrid learning for ANFISConsider an ANFIS with two inputs x & yLet there be 2 nodes in layers 2-3-4Denote the consequent parameters by p, q, rANFIS output is now given by

Partition S as follows:S1: premise parameters (nonlinear)S2: consequent parameters (linear)

222222111111

22221111

21

)()()()()()(

)()(21

2

21

1

rwqywpxwrwqywpxw

ryqxpwryqxpw

fffww

www

w

+++++=+++++=

+= ++

Page 26: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Hybrid learning for ANFISHow to learn the parameters of a linear model

such that is minimal?

(where and represent input and output for training examples)

least squares linear regression→

Page 27: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Linear RegressionError function:

Compute global minimum by means of derivative:

Page 28: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Stone-Weierstrass theoremLet D be a compact space of N dimensions and let F be a set

of continuous real-valued functions on D satisfying the following

Identity function: f(x)=1 is in FSeparability: for any two points x1≠x2 in D, there is an f in F

s.t. f(x1) ≠ f(x2)Algebraic closure: if f and g are two functions in F, then fg

and af+bg are also in F for real a and b F is dense in the set of continuous functions C(D) on D:

(∀ε>0 )(∀ g ∈C ( D ))(∃ f ∈F ) (∀ x∈D :∣g ( x )− f ( x )∣<ε )

Page 29: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Universal approximation in ANFISAccording to Stone-Weierstrass theorem, an ANFIS can approximate

any continuous nonlinear function arbitrarily well

Restriction to rules with a constant in the consequent and Gaussian membership functions

Identity: f(x)=1 is in FObtained by having a constant consequent

Separability: for any two points x1≠x2 in D, there is f in F s.t. f(x1) ≠ f(x2)Obtained by selecting different parameters in the network

Page 30: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Algebraic closureConsider two systems with two rules

Additive

Multiplicative

21

2211

21

2211ˆˆ

ˆˆˆˆˆand

wwfwfw

wwfwfw

zz ++

++ ==

22122111

2222121221211111

21

2211

21

2211

ˆˆˆˆ)ˆ(ˆ)ˆ(ˆ)ˆ(ˆ)ˆ(ˆ

ˆˆ

ˆˆˆˆˆ

wwwwwwwwfbafwwfbafwwfbafwwfbafww

wwfwfw

wwfwfw bazbaz

++++++++++

++

++

=

+=+

( )22122111

2222121221211111

21

2211

21

2211

ˆˆˆˆ

ˆˆˆˆˆˆˆˆ

ˆˆ

ˆˆˆˆˆ

wwwwwwwwffwwffwwffwwffww

wwfwfw

wwfwfwzz

++++++

++

++

=

=Use Gaussian

membership functions!

Page 31: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Model building guidelinesSelect number of fuzzy sets per variable by

empirically by examining dataclustering techniques regression trees (CART)

Initially, distribute bell-shaped membership functions evenly

Using an adaptive step size can speed up training

Page 32: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

What to do?Number of inputsWhat inputsNumber of outputsWhat outputsNumber of rulesType of consequent

functions

Normalize inputsCollect dataDefine objective

functionDetermine initial rulesInitialize networkDefine stop conditionsTRAIN

Page 33: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Two-input sinc functionInput range: [-10,10]x[-10,10]

121 data points

ANFIS: 16 rules, 4 membership functions per variable, 72 parameters (48 consequence, 24 premise)

MLP: 18 neurons in hidden layer, 73 parameters, quick propagation

ANFIS takes longer for same number of epochs

xyyx

yxz)sin()sin(

),sinc( ==

Page 34: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Average of 10 runs

Page 35: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

-100

10

-100

10

0

0.5

1

X

Training data

Y -100

10

-100

10

0

0.5

1

X

ANFIS Output

Y

0 50 1000

0.05

0.1

0.15

0.2

epoch number

root m

ean s

quare

d e

rror error curve

0 50 1000.1

0.15

0.2

0.25

0.3

0.35

epoch number

ste

p s

ize

step size curvelearning rate

Page 36: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

-10 -5 0 5 10

0

0.2

0.4

0.6

0.8

1

Deg

ree

o f m

embe

r shi

p

Initial MFs on X

-10 -5 0 5 10

0

0.2

0.4

0.6

0.8

1

Deg

ree

o f m

embe

r shi

p

Initial MFs on Y

-10 -5 0 5 10

0

0.2

0.4

0.6

0.8

1

input1

Deg

ree

o f m

embe

r shi

p

Final MFs on X

-10 -5 0 5 10

0

0.2

0.4

0.6

0.8

1

input2

Deg

ree

o f m

embe

r shi

p

Final MFs on Y

Page 37: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Modeling dynamic systemsf(.) has the following form

where

ANFIS parameters updated at each step

Learning rate: 0.1; forgetting factor: 0.99

ANFIS can adapt even after the input changes

f ( u )=0 .6sin (πu )+0 .3sin (3πu )+0 .1sin (5πu )

)250/2sin()( kku π=

Page 38: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron
Page 39: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

5 m

embe

rsh i

p fu

n cti

ons

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Initial MFs

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Final MFs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1f(u) and ANFIS Outputs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Each Rule's Outputs

Page 40: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

4 m

embe

rsh i

p fu

n cti

ons

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Initial MFs

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Final MFs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1f(u) and ANFIS Outputs

-1 -0.5 0 0.5 1

-1

0

1

Each Rule's Outputs

Page 41: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

3 m

embe

rsh i

p fu

n cti

ons

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Initial MFs

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Final MFs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1f(u) and ANFIS Outputs

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Each Rule's Outputs

Page 42: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Three-input nonlinear functionSystem mapping is given by

Two membership functions per variable, 8 rulesInput ranges: [1,6]x[1,6]x[1,6]215 training data, 125 validation data

211

31

+++=

zyxo

0 50 1000

0.2

0.4

0.6

0.8

1

epoch number

root

mea

n sq

uare

d er

ror error curves

training errorchecking error

0 50 1000

0.05

0.1

0.15

0.2

epoch number

step

siz

estep size curve

Page 43: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

1 2 3 4 5 6

0

0.2

0.4

0.6

0.8

1

input1

Deg

ree

o f m

embe

r shi

p

Initial MFs on X, Y and Z

1 2 3 4 5 6

0

0.2

0.4

0.6

0.8

1

input1

Deg

ree

o f m

embe

r shi

p

Initial MFs on Y

1 2 3 4 5 6

0

0.2

0.4

0.6

0.8

1

input2

Deg

ree

o f m

embe

r shi

p

Final MFs on X

1 2 3 4 5 6

0

0.2

0.4

0.6

0.8

1

input3

Deg

ree

o f m

embe

r shi

p

Final MFs on Y

Page 44: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Predicting chaotic time seriesConsider a chaotic time series generated by

Task: predict the output of system at some future instance t+P by using past output

500 training data, 500 validation dataANFIS input: [x(t-18),x(t-12),x(t-6),x(t)]ANFIS output: x(t+6)Two MFs per variable, 16 rules104 parameters (24 premise, 80 consequent)Data generated from t=118 to t=1117

d x ( t )/ dt=0 . 2x ( t−τ )

1+ x10( t−τ )−0 .1x ( t )

Page 45: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

Final MFs on Input 1, x(t - 18)

0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

Final MFs on Input 2, x(t - 12)

0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

Final MFs on Input 3, x(t - 6)

0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

Final MFs on Input 4, x(t)

Page 46: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

0 5 10

2

2.2

2.4

2.6

2.8

x 10-3 Error Curves

Training ErrorChecking Error

0 5 100.1

0.105

0.11

0.115Step Sizes

200 400 600 800 1000

0.6

0.8

1

1.2

Desired and ANFIS Outputs

200 400 600 800 1000-10

-5

0

5

x 10-3 Prediction Errors

Page 47: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Auto-Regression (AR) Model

Linear prediction:

Order = number of terms

Page 48: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron
Page 49: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

Order selection

Select optimal order of AR model in order to prevent overfitting

Select the order that minimizes the error on a test set

Root mean square error / standard deviation target

Page 50: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron
Page 51: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron
Page 52: Supervised learning neural networks Multilayer perceptron ...liacs.leidenuniv.nl/~nijssensgr/CI/2011/5 neural networks...Supervised learning neural networks • Multilayer perceptron

ANFIS extensionsDifferent types of membership functions in layer 1Parameterized t-norms in layer 2Interpretability

constrained gradient descent optimizationbounds on fuzziness

parameterize to reflect constraintsStructure identification

∑+==

PN

iii wwEE

1)ln(' β