neural networks for system modeling

329
Gábor Horváth Budapest University of Technology and Economics Dept. Measurement and Information Systems Budapest, Hungary Copyright © Gábor Horváth The slides are based on the NATO ASI presentation (NIMIA) in Crema Italy, 2002 NEURAL NETWORKS FOR SYSTEM MODELING

Upload: others

Post on 08-Nov-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gábor Horváth

Budapest University of Technology and Economics

Dept. Measurement and Information Systems

Budapest, Hungary

Copyright © Gábor HorváthThe slides are based on the NATO ASI presentation (NIMIA) in Crema Italy, 2002

NEURAL NETWORKS FOR SYSTEM MODELING

Page 2: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Outline• Introduction

• System identification: a short overview– Classical results

– Black box modeling

• Neural networks architectures– An overview

– Neural networks for system modeling

• Applications

Page 3: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Introduction

• The goal of this course:

to show why and how neural networks can beapplied for system identification– Basic concepts and definitions of system identification

• classical identification methods

• different approaches in system identification

– Neural networks• classical neural network architectures

• support vector machines

• modular neural architectures

– The questions of the practical applications, answers based on a real industrial modeling task (case study)

Page 4: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

System identification

Page 5: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

System identification: a short overview• Modeling

• Identification– Model structure selection

– Model parameter estimation

• Non-parametric identification– Using general model structure

• Black-box modeling– Input-output modeling, the description of the behaviour of a

system

Page 6: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling

• What is a model?

• Why we need models?

• What models can be built?

• How to build models?

Page 7: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling

• What is a model?

– Some (formal) description of a system, a separable part

of the world.

Represents essential aspects of a system

– Main features:

• All models are imperfect. Only some aspects are taken

into consideration, while many other aspects are

neglected.

• Easier to work with models than with the real systems

– Key concepts: separation, selection, parsimony

Page 8: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling

• Separation:– the boundaries of the system have to be defined. – system is separated from all other parts of the world

• Selection:Only certain aspects are taken into consideration e.g.

– information relation, interactions – energy interactions

• Parsimony: It is desirable to use as simple model as possible

– Occam’s razor (William of Ockham or Occam) 14th Century Englishphilosopher)

The most likely hypothesis is the simplest one that is consistent with all observationsThe simpler of two theories, two models is to be preferred.

Page 9: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling

• Why do we need models?– To understand the world around (or its defined part) – To simulate a system

• to predict the behaviour of the system (prediction, forecasting),• to determine faults and the cause of misoperations,

fault diagnosis, error detection,• to control the system to obtain prescribed behaviour, • to increase observability: to estimate such parameters which are

not directly observable (indirect measurement), • system optimization.

– Using a model• we can avoid making real experiments,• we do not disturb the operation of the real system, • more safe then working with the real system,• etc...

Page 10: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling

• What models can be built?

– Approaches

• functional models– parts and its connections based on the functional role

in the system

• physical models– based on physical laws, analogies (e.g. electrical

analog circuit model of a mechanical system)

• mathematical models– mathematical expressions (algebraic, differential

equations, logic functions, finite-state machines, etc.)

Page 11: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling

• What models can be built?

– A priori information

• physical models, “first principle” models

use laws of nature

• models based on observations (experiments)

the real physical system is required for

obtaining observations

– Aspects

• structural models

• input-output (behavioral) models

Page 12: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification

• What is identification?

– Identification is the process of deriving a

(mathematical) model of a system using

observed data

Page 13: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Measurements

• Empirical process – to obtain experimental data (observations),

• primary information collection, or

• to obtain additional information to the a

priori one.

– to use the experimental data for obtaining (determining) the free parameters (features) of a model.

– to validate the model

Page 14: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification (measurement)The goal of modelingThe goal of modelingThe goal of modeling

Collecting a priori knowledgeCollecting a priori knowledgeCollecting a priori knowledge

A priori modelA priori modelA priori model

Experiment designExperiment design

Observations, determiningfeatures, parameters

Observations, determiningfeatures, parameters

Model validationModel validation

Final modelFinal model

CorrectionCorrection

Measurement

Identification

Page 15: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Based on the system characteristics

• Based on the modeling approach

• Based on the a priori information

Model classes

Page 16: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model classes• Based on the system characteristics

– Static – dynamic

– Deterministic – stochastic

– Continuous-time – discrete-time

– Lumped parameter – distributed parameter

– Linear – non-linear

– Time invariant – time variant – …

Page 17: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model classes

• Based on the modeling approach– parametric

• known model structure

• limited number of unknown parameters

– nonparametric• no definite model structure

• described in many points (frequency characteristics, impulse response)

– semi-parametric• general class of functional forms are allowed

• the number of parameters can be increased independently of the size of the data

Page 18: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model classes

• Based on the a priori information (physical insight)

– white-box

– gray-box

– black-boxBlack-box

White-box

Structure ParametersStructure Parameters

Structure ParametersStructure Parameters

Structure ParametersStructure Parameters

Structure ParametersStructure Parameters Gray-box

Structure ParametersStructure Parameters

Known Missing (Unknown)

Page 19: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification

• Main steps— collect information

– model set selection

– experiment design and data collection

– determine model parameters (estimation)

– model validation

Page 20: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification• Collect information

– physical insight (a priori information)

understanding the physical behaviour– only observations or experiments can be designed – application

• what operating conditions– one operating point– a large range of different conditions

• what purpose– scientific

basic research – engineering

to study the behavior of a system, to detect faults, to design control systems,etc.

Page 21: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification

• Model set selection– static – dynamic

– linear – non-linear

– non-linear

• linear - in - the - parameters

• non-linear - in - the - parameters

– white-box – black-box

– parametric – non-parametric

Page 22: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification

• Model structure selection

– known model structure (available a priori

information)

– no physical insights, general model structure

• general rule: always use as simple model as

possible (Occam’s razor)

– linear

– feed-forward

••

Page 23: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Experiment design and data collection• Excitation

– input signal selection

– design of excitation• time domain or frequency domain identification

(random signal, multi-sine excitation, impulse response, frequency characteristics)

• persistent excitation

• Measurement of input-output data – no possibility to design excitation signal

• noisy data, missing data, distorted data

• non-representing data

Page 24: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Excitation

• Step function

• Random signal (autoregressive moving average (ARMA) process)

• Pseudorandom binary sequence

• Multisine

Page 25: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Excitation

• Step function

Page 26: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Excitation

• Random signal (autoregressive moving

average (ARMA) process)

– obtained by filtering white noise

– filter is selected according to the desired

frequency characteristic

– an ARMA(p,q) process can be characterized

• in time domain

• in lag (correlation) domain

• in frequency domain

Page 27: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Excitation

• Pseudorandom binary sequence– The signal switches between two levels with given probability

– Frequency characteristics depend on the probability p– Example

-1y probabilit with)(

y probabilit with)()1(

⎩⎨⎧

−=+

pkupku

ku

1 1

-1/N

NTctime function autocorrelation function

Page 28: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Excitation

• Multisine

– where is the maximum frequency of the excitation signal,

K is the number of frequency components

• Crest factorminimizing CF with the selection of φ phases

∑=

⎟⎠⎞

⎜⎝⎛ +=

K

kk kf

NkUku

1max )(2cos)( ϕπ

( ))()(max

tutu

CFrms

=

maxf

Multisine with minimal crest factor

Page 29: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Excitation

• Persistent excitation– The excitation signal must be „rich” enough to

excite all modes of the system – Mathematical formulation of persistent excitation

• For linear systems– Input signal should excite all frequencies,

amplitude not so important

• For nonlinear systems– Input signal should excite all frequencies and

amplitudes– Input signal should sample the full regressor

space

Page 30: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

The role of excitation: small excitation signal(nonlinear system identification)

0 500 1000 1500 2000-20246

Mod

elou

tput

0 500 1000 1500 2000-20246

Plan

tout

put

0 500 1000 1500 2000-2024

Erro

r

Page 31: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

The role of excitation: large excitation signal(nonlinear system identification)

0 500 1000 1500 2000-20246

Mod

elou

tput

0 500 1000 1500 2000-20246

Plan

tout

put

0 500 1000 1500 2000-2024

Erro

r

Page 32: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (some examples)

• Resistor modeling

• Model of a duct (an anti-noise problem)

• Model of a steel converter (model of a

complex industrial process)

• Model of a signal (time series modeling)

Page 33: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Resistor modeling– the goal of modeling: to get a description of a

physical system (electrical component)

– parametric model• linear model• constant parameter

• variant model

• frequency dependent c

DC

12)(

)()()()()()(

+===

CRfjRfZ

fIfUfZfIfZfU

π

RRIU =

I

UI R(I)

IIRU )(=

U

R AC

Page 34: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)• Resistor modeling

– nonparametric model

Z

fI

Unonlinear

I

Ulinear

AC

frequency dependent

DC

Page 35: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)• Resistor modeling

– parameter estimation based on noisy measurements

I

Ulinear

OutputSystem

+ +nunI

Input

I U

System

nu

OutputInput

+nI

I

+

U

+

System noise

Measurement noise

+

Input noise

Page 36: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Model of a duct– the goal of modeling: to design a controller for

noise compensation.

active noise control problem

Page 37: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

Page 38: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Model of a duct – physical modeling: general knowledge about acoustical

effects; propagation of sound, etc.

– no physical insight. Input: sound pressure, output: sound pressure

– what signals: stochastic or deterministic: periodic, non-periodic, combined, etc.

– what frequency range

– time invariant or not

– fixed solution, adaptive solution. Model structure is fixed, model parameters are estimated and adjusted: adaptive solution

Page 39: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Model of a duct– nonparametric model of the duct (H1)– FIR filter with 10-100 coefficients

0 200 400 600 800 1000-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

frequency (Hz)

mag

nitu

de(d

B)

Page 40: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Nonparametric models: impulse responses

Page 41: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• The effect of active noise compensation

Page 42: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Model of a steel converter (LD converter)

Page 43: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Model of a steel converter (LD converter)– the goal of modeling: to control steel-making

process to get predetermined quality steel

– physical insight: • complex physical-chemical process with many inputs

• heat balance, mass balance

• many unmeasurable (input) variables (parameters)

– no physical insight: • there are input-output measurement data

– no possibility to design input signal, no possibility to cover the whole range of operation

Page 44: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling (example)

• Time series modeling– the goal of modeling: to predict the future

behaviour of a signal (forecasting)• financial time series

• physical phenomena e.g. sunspot activity

• electrical load prediction

• an interesting project: Santa Fe competition

• etc.

– signal modeling = system modeling

Page 45: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Time series modeling

0 200 400 600 800 10000

50

100

150

200

250

0 200 400 600 800 1000 12000

50

100

150

200

250

300

Page 46: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Time series modeling

0 20 40 60 80 100 120 140 160 180 2000

50

100

150

200

250

300

Page 47: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Time series modeling

• Output of a neural model

Page 48: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsBox, G.E.P and Jenkins, G.M: “Time Series Analysis: Forecasting and Control”, Revised Edition,

Holden Day, 1976Eykhoff, P. “System Identification, Parameter and State Estimation”, Wiley, New York, 1974.Goodwin, G.C. and R. L. Payne, “Dynamic System Identification”, Academic Press, New York, 1977.Horváth, G. “Neural Networks in Systems Identification”, (Chapter 4. in: S. Ablameyko, L. Goras, M.

Gori and V. Piuri (Eds.) Neural Networks in Measurement Systems) NATO ASI, IOS Press, pp. 43-78. 2002.

Horváth, G., Dunay, R.: "Application of Neural Networks to Adaptive Filtering for Systems with External Feedback Paths." Proc. of The International Conferenace on Signal Processing Application and Technology. Vol. II. pp. 1222-1227. Dallas, Tx. 1994.

Ljung, L. “System Identification - Theory for the User”. Prentice-Hall, N.J. 2nd edition, 1999.Pintelon R. and Schoukens, J. “System Identification. A Frequency Domain Approach”, IEEE Press,

New York, 2001.Pataki, B., Horváth, G., Strausz, Gy. and Talata, Zs. "Inverse Neural Modeling of a Linz-Donawitz

Steel Converter" e & i Elektrotechnik und Informationstechnik, Vol. 117. No. 1. 2000. pp. 13-17.Rissanen, J. “Modelling by Shortest Data Description”, Automatica, Vol. 14. pp. 465-471, 1978.Sjöberg, J., Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P.-Y. Glorennec, H. Hjalmarsson, and A.

Juditsky: "Non-linear Black-box Modeling in System Identification: a Unified Overview",Automatica, 31:1691-1724, 1995.

Söderström, T. and P. Stoica, “System Indentification”, Prentice Hall, Englewood Cliffs, NJ. 1989.Weigend,. A.S and N.A Gershenfeld "Forecasting the Future and Understanding the Past" Vol.15.

Santa Fe Institute Studies in the Science of Complexity, Reading, MA. Addison-Wesley, 1994.

Page 49: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Identification (linear systems)

• Parametric identification (parameter estimation)

– LS estimation

– ML estimation

– Bayes estimation

• Nonparametric identification

– Transient analysis

– Correlation analysis

– Frequency analysis

Page 50: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identificationn

y

yM

CCriterion

C(y,yM)

Model

Parameter

algorithm

System

function

adjustment

u

y=f (u,n)

yM=f M(u,θ)

Page 51: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification• Parameter estimation

– linear system

– linear-in-the parameter model

– criterion (loss) function

NiiniuiniiyL

jjj

T 1,2,..., )()()()()(1

=+=+Θ= ∑=

Θu

jj

jT

M iuiiy Θ)(ˆ)()( ∑=Θ= u

[ ])()1( NyyTN

T L== yy

nUy += Θ

⎥⎥⎥

⎢⎢⎢

=T

T

N )(

)1(

u

uU M

ΘUy =M

( ) ( ) ( )( )ΘΘΘ ˆ))ˆ((ˆMM VVVV yyyy −=−== ε( )ΘΘ ˆ)ˆ( Myy −=ε

Page 52: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• LS estimation

quadratic loss function

LS estimate

( )

( ) ( )( ) ( ) ( )( ) ( ) ( )ΘΘΘΘ

εεΘ

ˆˆ21ˆˆ

21

21

21)ˆ(

1

1

2

UyUyuu −−=−−

===

=

=

N

T

NTT

N

i

N

i

T

iiyiiy

V ιε

)ˆ(minargˆˆ

ΘΘΘ

VLS = 0ˆ)ˆ(

=∂

∂ΘΘV

NTNN

TNLS yUUU 1)(ˆ −=Θ

Page 53: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Weighted LS estimation

– weighted quadratic loss function

weighted LS estimate

– Gauss-Markov estimate (BLUE=best linear unbiased

estimate)

NTNN

TNWLS yUUU 111 )(ˆ −−−= ΣΣΘ

NTNN

TNWLS QyUQUU 1)(ˆ −=Θ

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )ΘUyQΘUyΘuΘuΘ ˆˆ21ˆˆ

21

21)ˆ(

1,1

2 −−=−−∑=∑===

NT

NT

ikTN

ki

N

ikkyqiiyV ιε

1−= ΣQ 0n =E Σ]cov[ =n

Page 54: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Maximum likelihood estimation– we select the estimate which makes the given

observations most probable

– likelihood function, log likelihood function

– maximum likelihood estimate

)ˆ(maxargˆˆ

ΘΘΘ

NML f y=

y Measurements

( )1Θyf ( )MLyf Θ ( )kyf Θ…

)ˆ( ΘNf y )ˆ(log Θy Nf

0ΘyΘ

=∂∂ )ˆ(logˆ Nf

Page 55: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification• Properties of ML estimates

– consistency

– asymptotic normality

– asymptotic efficiency: the variance reaches Cramer-Rao

lower bound

– Gauss-Markov if Gaussian

( ) 0 anyfor 0ˆlim >=>−∞→

εεΘΘ NMLNP

converges to a normal random variable as N→∞)(ˆ

NMLΘ

( ) 1

2

2

)(

ln)ˆvar(lim

∞→ ⎟⎟⎠

⎞⎜⎜⎝

⎭⎬⎫

⎩⎨⎧

∂∂

−=−Θ

ΘΘΘ

yfENMLN

)ˆ( ΘNf y

Page 56: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Bayes estimation

– the parameter Θ is a random variable with known pdf

the loss function

– Bayes estimate

a prioria posteriori

f(Θ)f(Θy)

Θ

( ) ( ) ΘΘΘΘΘ dfCVB )(ˆˆ y∫=

( ) ΘΘΘΘΘΘ

dfCB )(ˆminargˆˆ

y∫=

Page 57: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Bayes estimation with different cost functions

– median

– MAP

– mean

( ) ΘΘΘΘ −= ˆˆC

( ) 2ˆˆ ΘΘΘΘ −=C

( )⎪⎩

⎪⎨⎧ ≤−=

otherwise0

ˆifConstˆ ΔΘΘΘΘC

MAP MEAN MEDIAN

f(Θy)Cost functions

)-ˆ( ΘΘΔ Δ

Page 58: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Recursive estimations

– is estimated from

– is predicted as

– the error is determined

– update the estimate from and

( )kΘ 11)( −

=kiiy

)(ky Θu ˆ)()( TM kky =

)()()( kykyke M−=

( )1ˆ +kΘ ( )kΘ )(ke

Page 59: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Recursive estimations– least mean square LMS

– the simplest gradient-based iterative algorithm

– it has important role in neural network training

( ) ( ) ( ) ( ) ( )kkkkk uΘΘ εμ+=+ ˆ1ˆ

Page 60: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification• Recursive estimations

– recursive least square RLS

where is defined as

changes the search direction from instantenous

gradient direction

( ) ( ) ( ) ( )kkkk ε1ˆ1ˆ ++=+ KΘΘ

( ) ( ) ( ) ( ) ( ) ( )[ ] 11111 −++++=+ kkkkkk TUPUIUPK

( ) ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( )kkkkkkkkk TT PUUPUIUPPP 11111 1+++++−=+

[ ] 1)()()(

−= kkk T UUP)(kP

( )kK

Page 61: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Recursive estimations– recursive Bayes a posteriori df

a priori a posteriori a priori a posteriori

observation observationyk-1 yk

k-1 k

( )yf Θ

( ) ( ) ( )

( ) ( )∫∞+

∞−

=ΘΘΘy

ΘΘyyΘ

dff

fff

1

11 ( ) ( ) ( )

( ) ( )∫∞+

∞−

=ΘΘyΘyy

ΘyΘyyyyΘ

dff

fff

,,

,,,

112

11221

( ) ( ) ( )

( ) ( )∫∞+

∞−−−

−−=ΘΘyyyΘyyyy

ΘyyyΘyyyyyyyΘ

dff

fff

kk

kkkk

,,,,,,,,

,,,,,,,,,,,

1211212

12112121

KK

KKK

Page 62: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Parametric identification

• Parameter estimation− Least square less a priori information

− Maximum Likelihood

conditional probability density f.

− Bayes most a priori information

a priori probability density f.

conditional probability density f.

cost function

)ˆ( ΘNf y

( )ΘΘC

)(Θf

)ˆ( ΘNf y

Page 63: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Non-parametric identification

• Frequency-domain analysis

– frequency characteristic, frequency response

– spectral analysis

• Time-domain analysis

– impulse response

– step response

– correlation analysis

• These approaches are for linear dynamical systems

Page 64: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Non-parametric identification (frequency domain)

• Secial input signals– sinusoid

– multisine

where is the maximum frequency of the excitation signal

K is the number of frequency components

crest factor

minimizing CF with the selection of φ phases

∑=

⎟⎠⎞

⎜⎝⎛ +

=K

k

kfNkj

keUtu1

)(2 max)(

ϕπ

maxf

( ))()(max

tutu

CFrms

=

Page 65: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Non-parametric identification (frequency domain)

• Frequency response

– Power density spectrum, periodogram

– Calculation of periodogram

– Effect of finite registration length

– Windowing (smoothing)

Page 66: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsEykhoff, P. System Identification, Parameter and State Estimation, Wiley, New York, 1974.

Ljung, L. ”System Identification - Theory for the User” Prentice-Hall, N.J. 2nd edition, 1999.

Goodwin, G.C. and R.L. Payne, Dynamic System Identification, Academic Press, New York, 1977.

Rissanen, J. “Stochastic Complexity in Statistical Inquiry”, Series in Computer Science”. Vol. 15 World Scientific, 1989.

Sage, A.P. and J.L. Melsa, Estimation Theory with Application to Communications and Control, McGraw-Hill, New York, 1971.

Pintelon, R. and J. Schoukens, System Identification. A Frequency Domain Approach, IEEE Press, New York, 2001.

Söderström, T. and P. Stoica, System Indentification, Prentice Hall, Englewood Cliffs, NJ. 1989.

Van Trees, H.L. Detection Estimation and Modulation Theory, Part I. Wiley, New York, 1968.

Page 67: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black box modeling

Page 68: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box modeling

• Why do we use black-box models?

– the lack of physical insight: physical modeling is not

possible

– the physical knowledge is too complex, there are

mathematical difficulties; physical modeling is possible

in principle but not possible in practice

– there is no need for physical modeling, (only the

behaviour of the system should be modeled)

– black-box modeling may be much simpler

Page 69: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box modeling

• Steps of black-box modeling

– select a model structure

– determine the size of the model (the number of

parameters)

– use observed (measured) data to adjust the model

(estimate the model order - the number of

parameters - and the numerical values of the

parameters)

– validate the resulted model

Page 70: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box modeling

• Model structure selectionDynamic models: with

how to chose φ(k) regressor-vectors?past inputs

past inputs and outputs

past inputs and system outputs

past inputs, system outputs and errors

past inputs, outputs and errors

( ) ( )( )kfkyM ϕΘ,=

( ) ( ) ( ) ( )],...,2,1[ Nkukukuk −−−=ϕ

( ) ( ) ( ) ( ) ( ) ( ) ( )],...,2,1,,...,2,1[ PkykykyNkukukuk −−−−−−=ϕ

( ) ( ) ( ) ( ) ( ) ( ) ( )],...,2,1,,...,2,1[ PkykykyNkukukuk MMM −−−−−−=ϕ

( ) ( ) ( ) ( ) ( ) ( ) ( )],...,1,,...,1,,...,1[ LkkPkykyNkukuk −−−−−−= εεϕ

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )],...,1,,...,1,,...,1,,...,1[ KkkLkkPkykyNkukuk uuMM −−−−−−−−= εεεεϕ

φ(k) regressor-vectors

Page 71: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Linear dynamic model structuresFIR

ARX

OE

ARMAX

BJ

parameter vector

( ) ( ) ( ) ( )Nkuakuakuaky NM −++−+−= ...21 21

TNaaa ]...[ 21=Θ

( ) ( ) ( ) ( ) ( ) ( ) ( )LkckcPkybkybNkuakuaky LPNM −++−+−++−+−++−= εε KKK 111 111

TKLPN dddcccbbbaaa ],,,[ 21212121 KKKK=Θ

( ) ( ) ( ) ( ) ( )PkybkybNkuakuaky PNM −++−+−++−= KK 11 11

( ) ( ) ( ) ( ) ( )PkybkybNkuakuaky MPMNM −++−+−++−= KK 11 11

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )KkdkdLkckc

PkybkybNkuakuaky

uKuL

PNM

−++−+−++−++−++−+−++−=

εεεε KK

KK

1111

11

11

M

Page 72: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Non-linear dynamic model structuresNFIR

NARX

NOE

NARMAX

NBJ

( ) ( ) ( ) ( )( )NkukukufkyM −−−= ,...,2,1

( ) ( ) ( ) ( ) ( )( )PkykyNkukufkyM −−−−= ,...,1,,...,1

( ) ( ) ( ) ( ) ( )( )PkykyNkukufky MMM −−−−= ,...,1,,...,1

( ) ( ) ( ) ( ) ( ) ( ) ( )( )LkkPkykyNkukufkyM −−−−−−= εε ,...,1,,...,1,,...,1

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )],...,1,,...,1,,...,1,,...,1[ KkkLkkPkykyNkukufky uuM −−−−−−−−= εεεε

Page 73: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• How to choose nonlinear mapping?

– linear-in-the-parameter models

– nonlinear-in-the-parameters

( ) ( )( )kfkyM ϕΘ,=

( ) ( )( )kfky j

n

jjM ϕ∑

=

=1α [ ]T

nααα K21=Θ

( ) ( )( )kfky jj

n

jjM ϕ,

1β∑

=

= α [ ]Tnn βββααα KK 2121 ,=Θ

Page 74: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection

– residual test

– Information Criterion:

• AIC Akaike Information Criterion

• BIC Bayesian Information Criterion

• NIC Network Information Criterion

• etc.

– Rissanen MDL (Minimum Description Length)

– cross validation

Page 75: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation: residual test

residual: the difference between the model and the measured (system)

output

– autocorrelation test:

• are the residuals white (white noise process with mean 0)?

• are residuals normally distributed?

• are residuals symmetrically distributed?

– cross correlation test:

• are residuals uncorrelated with the previous inputs?

( ) ( )kkk Myy −=)(ε

Page 76: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation: residual test autocorrelation test:

( ) ∑+=

−−

=N

kkk

NC

1)()(1ˆ

τεε τεε

ττ

( )TmCCC

)(ˆ)1(ˆ)0(ˆ

1εεεε

εεεε K=r

)0(dist

Ir , N N→εε

Page 77: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation: residual test– cross-correlation test:

( ) ∑+=

−−

=N

ku kuk

NC

1

)()(1ˆτ

ε τετ

τ

( ) ( )Tuuu

u mCCC

m )(ˆ)1(ˆ)0(ˆ

1++= ττ εε

εε Kr

)ˆ0(dist

uuu , N Rr N→ε

[ ]mkk

N

mk mk

kuu uu

u

u

mN −−+= −

−∑

⎥⎥

⎢⎢

−= LM 1

1

11R

Page 78: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• residual test

0 5 10 15 20 25-0.5

0

0.5

1

lag

Auto correlation function of prediction error

-25 -20 -15 -10 -5 0 5 10 15 20 25-0.4

-0.2

0

0.2

0.4

lag

Cross correlation function of past input and prediction error

Page 79: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection

– the importance of a priori knowledge

(physical insight)

– under- or over-parametrization

– Occam’s razor

– variance-bias trade-off

Page 80: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection

– criterions: noise term+penalty term

• AIC:

• NIC network information criterion

extension of AIC for neural networks

• MDLp = number of parameters

M = Fisher information matrix

( ) pLp N 2ˆlog)2()(AIC +−= Θ

p2)likelihood imum(maxlog)2()ˆ(AIC +−=Θ

MNNpNpLp Θ++Θ−= ˆlog2

log2

)ˆ(log)2()(MDL

Page 81: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection

– cross validation

• testing the model on new data (from

the same problem)

• leave out one cross validation

• leave out k cross validation

Page 82: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection

– variance-bias trade-off

difference between the model and the real

system

• model class is not properly selected: bias

• actual parameters of the model are not

correct: variance

Page 83: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection– variance-bias trade-off

The order of the model (m) is the dimension of φ(k).

The larger m the smaller bias and the larger variance

( ) ( )( ) σ ance with varinoise white)( )(, k nknkfky o += ϕΘ

( ) ( ) ( ) ( )⎭⎬⎫

⎩⎨⎧ −+=−

2

02 )(,ˆ)(,EE= kfkffV ϕΘϕΘσΘΘ y

( ) ( ) ( )( ) ( ) ( ) ( )

ance vari bias noise

)(,ˆ)(),(E)(),()(,E

)(,ˆ)(,E=

2*2*0

2

0

⎭⎬⎫

⎩⎨⎧ −+−+≈

⎭⎬⎫

⎩⎨⎧ −+

kfkmfkmfkf

kfkfVE

ϕΘϕΘϕΘϕΘσ

ϕΘϕΘσΘ

Page 84: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Black-box identification

• Model validation, model order selection– approaches

• A sequence of models are used with increasing m

Validation using cross validation or some criterion e.g. AIC, MDL, etc.

• A complex model structure is used with a lot of parameters (over-parametrized model)

Select important parameters

– regularization

– early stopping

– pruning

Page 85: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural modeling

• Neural networks are (general) nonlinear black-box structures with “interesting” properties– general architecture

– universal approximator

– non-sensitive to over-parametrization

– inherent regularization

Page 86: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural networks

• Why neural networks?– There are many other black-box modeling approaches:

e.g. polynomial regression.

– Difficulty: curse of dimensionality

– In high-dimensional (N) problem and using M-th order

polynomial the number of the independently adjustable

parameters will grow as NM.

– To get a trained neural network with good

generalization capability the dimension of the input

space has significant effect on the size of required

training data set.

Page 87: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural networks

• The advantages of neural approach

– Neural nets (MLP) use basis functions to

approximate nonlinear mappings, which

depend on the function to be approximated.

– This adaptive basis function set gives the

possibility to decrease the number of free

parameters in our general model structure.

Page 88: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Other black-box structures

• Wavelets– mother function (wavelet), dilation, translation

• Volterra series

Volterra series can be applied succesfully for weakly nonlinear systems and impractical in strongly nonlinear systems

L+−−−+−−+−= ∑ ∑ ∑∑ ∑ ∑∞

=

=

=

=

=

=)()()()()()()(

0 0 00 0 0rkuskulkugskulkuglkugky

l s rlsr

l l slslM

Page 89: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

•Fuzzy models, fuzzy neural models

– general nonlinear modeling approach

•Wiener, Hammerstein, Wiener-Hammerstein

– dynamic linear system + static nonlinear

– static nonlinear + dynamic linear system

– dynamic linear system + static nonlinear + dynamic linear

•Narendra structures

– other combined linear dynamic and nonlinear static systems

Other black-box structures

Page 90: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Combined models

• Narendra structures

Page 91: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsAkaike, H. “Information Theory and an Extension of the Maximum Likelihood Principle” Second Intnl. Symposium on Information Theory. Akadémiai Kiadó, Budapest, pp. 267-281. 1972.Akaike, H. “A New Look at the Statistical Model Identification” IEEE Trans. On Automatic Control, Vol. 19. No. 9. pp. 716-723. 1974.Haykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N. J.1999.

L. Ljung, ”System Identification - Theory for the User” Prentice-Hall, N.J. 2nd edition, 1999.

Narendra, K. S. and Pathasarathy, K. "Identification and Control of Dynamical Systems Using Neural Networks," IEEE Trans. Neural Networks, Vol. 1. 1990. pp. Noboru Murata, Shuji Yoshizawa and Shun-Ichi Amari “Network Information Criterion - Determining the Number of Hidden Units for an Artificial Neural Network Model” IEEE Trans. on Neural Networks, Vol. 5. No. 6. Pp. 865-871Pataki, B., Horváth, G., Strausz, Gy. and Talata, Zs. "Inverse Neural Modeling of a Linz-DonawitzSteel Converter" e & i Elektrotechnik und Informationstechnik, Vol. 117. No. 1. 2000. pp. 13-17.M.B. Priestley, “ Non-linear and Non-stationary Time Series Analysis” Academic Press, London, 1988.

Rissanen, J. “Stochastic Complexity in Statistical Inquiry”, Series in Computer Science”. Vol. 15 World Scientific, 1989.J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P.-Y. Glorennec, H. Hjalmarsson, and A. Juditsky: "Non-linear Black-box Modeling in System Identification: a Unified Overview", Automatica, 31:1691-1724, 1995. A. S Weigend,. - N.A Gershenfeld "Forecasting the Future and Understanding the Past" Vol.15. Santa Fe Institute Studies in the Science of Complexity, Reading, MA. Addison-Wesley, 1994.

Page 92: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural networks

Page 93: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Outline• Introduction• Neural networks

– elementary neurons– classical neural structures– general approach – computational capabilities of NNs

• Learning (parameter estimation)– supervised learning– unsupervised learning– analytic learning

• Support vector machines– SVM architectures– statistical learning theory

• General questions of network design– generalization– model selection– model validation

Page 94: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural networks• Elementary neurons

– linear combiner– basis-function neuron

• Classical neural architectures– feed-forward– feedback

• General approach– nonlinear function of regressors– linear combination of basis functions

• Computational capabilities of NNs– approximation of function– classification

Page 95: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural networks (a definition)

Neural networks are massively parallel distributed information processing systems, implemented in hardware or software form

made up of: a great number highly interconnected identical or similar simple processing units (processing elements, neurons) which are doing local processing, and are arranged in ordered topology,

have learning algorithm to acquire knowledge from their environment, using examples

have recall algorithm to use the learned knowledge

Page 96: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural networks (main features)

• Main features– complex nonlinear input-output mapping

– adaptivity, learning capability

– distributed architecture

– fault tolerance

– VLSI implementation

– neurobiological analogy

Page 97: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

The elementary neuron (1)

=1

x1

x2

xN

0

y=f(s)f (s)

w0

s=wTxw1

w2

wN

x

Σ

• Linear combiner with nonlinear activation function

y= y=

-1 _

s

s

y(s) y(s)

s s

- 1 -1

+1 +1

a.) b.)

+1 >

s <

+1

-1

>1

-1 < s <1 _ _

s

s <-1

0

0

activation functions

y ________1

1 + -Ks ; K

y(s)

s

+1

d.)

y ________1 -

1 +

-Ks

-Ks ; K

y(s)

s

-1

+1

c.)

e= =

e e

0,5

>0>0

Page 98: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Elementary neuron (2)• Neuron with basis function

gi(x)

f (.)

x

x

1

2

x N

u

y

e -( )u 2σ

( )x∑=i

ii gwy

( ) ii gg cxx −=Basis functions e.g. Gaussian

2222

211 )()()( NN cxcxcxu −++−+−= K

Page 99: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Classical neural networks• static (no memory, feed-forward)

– single layer networks– multi-layer networks

• MLP• RBF• CMAC

• dynamic (memory or feedback)– feed-forward (storage elements)– feedback

• local feedback• global feedback

Page 100: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

• Single layer network: Rosenblatt’s perceptron

=1

x 1

x 2

x N

0

y=sgn(s)

w0

s= wTxw1

w2

wN

x

Σ

Page 101: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

• Single layer network

OutputsInputs

x

x

x

x

y

y

y

W

Trainable parameters (weights)

N

1

12

23

M

Page 102: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

y(2)=y

yn

y = xx (1) (2)

x =oo(2)

Σ

Σ

Σ

Σ

Σ

Σ

W(2)(1)

(1)W

1 1x =

x

x

x

x

1

2

3

N

(1)

(1)

(1)

(1)

(1) f(.) f(.)

f(.)

f(.) f(.)

f(.)

y1

y2

• Multi-layer network (static MLP network )

Page 103: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture• Network with one trainable layer (basis function

networks)

(k)(k)

(k)

1

2

M

x(k) y(k)=wTϕ(k)mapping

X

Σ

Linear trainable layer

w

1

+1

w

w

w

0

2

wM

1

Non-linear

ϕ

ϕ ϕ

Fixedor trainedsupervised or unsupervised

ϕ(k) w

Page 104: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Radial basis function (RBF) network

• Network with one trainable layerσ

g =

x

x

1

2

N

y

+1

w

c

c

c

Σ

1,

2

M M

input layer hidden layer output layer

x

ϕ =0

g =

g =

2

1

1 1

2 2

M M

( )

( )

( )

x

x

x

x ϕ

ϕ

ϕ

Radial, e.g Gaussian basis function

Page 105: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC network

• Network with one trainable layer

Σ

Space of possibleinput vectors

xx

i+

i+3

x a a y

y

wwwww

i

i+

i+

i+

2

1

3

C=4

waBinary association weight vector

vector (trainable)

x1

2

i+

i+

4

5x

x

2

3

xx

xx

j+

j+

1

2

3

j j

j+

w

wwww

j+

j+

j+

1

3

2

xi

x

x

x

i+1

i+

i+

4

5

y = wTa(x)

Page 106: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

• Dynamic multi-layer network

( +1)

s

z

y =x

FIR filter

(.) (.)

(.)

(.)

(.)

(.)

Σ

Σ

Σ Σ

Σ

Σ

-th layer

( )

( )

1 y

y = x

1

( )

( )

2 2

21

l

.

.

.

l l

l l l

f f

f

f f

f

1 1 l- l ( 1) ( )

w l ( )

21

Page 107: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

• Dynamic multi layer network (single trainable layer)

FIR filterFIR filter

FIR filter

Σ

(k) (k)

(k)

1

2

M

x(k)

z (k)

z (k)

1

M

y(k)z (k)2

The firstlayer of thenetwork

(non-linearmapping)

ϕ

ϕϕ

Page 108: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feedback architecture

• Lateral feedback (single layer)

OutputsInputs

x

x

x

x Feed-forwardparameters

y

y

y

Lateralconnections

3

N

2

2

1

3

w

w

1

1

3

1

2

3 ∑=j

jiji xwy

Page 109: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feedback architecture

Input layer 1. hidden layer 2. hidden layer Output layer

a

a

b b

c

c

x

x

x

x

x

yInput Output

y

y

y

y

1 1

2 2

3 3

N M

a.)self feedback, b.) lateral feedback, c.) feedback between layers

• Local feedback (MLP)

Page 110: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feedback architecture

• Global feedback (sequential network)

Multi-inputsingle output

TDL

TD

L

Input

Outputstatic network

x(k-1)

x(k-N)

x(k)

x(k)

y(k)

y(k-M)y(k-2)

y(k-1)

Page 111: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feedback architecture

• Hopfield network (global feedback)

N1

wwww

w

13 14 1N

2N

3N

24

34

44 4N

N4 NN

23

33

12

22

32

42 43

11

21

31

41

N2 N3

x x x x x1 2 3 4 N

wwww

w

wwww

w

wwww

w

wwww

w

Page 112: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures

• Genaral approach– Regressors

• current inputs (static networks)

• current inputs and past outputs (dynamic networks)

• past inputs and past outputs (dynamic networks)

– Basis functions• non-linear-in-the-parameter network

• linear-in-the-parameter networks

Page 113: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures• Non-linear dynamic model structures based on regressor

– NFIR

Multi-inputsingle output

TDL

Input

Outputstatic network

x(k-1)

x(k-N)

x(k)

x(k)

y(k)

( ) ( ) ( )( )Nkxkxkxfky −−= ,...,1),(

Page 114: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures

• Non-linear dynamic model structures based on regressor

– NARX

From the system’s output, d(k)

Multi-inputsingle output

T

D

L

T

D

L

Input

Outputstatic network

x(k-1)

x(k-N)

x(k)

x(k)

y(k)

d(k-M)d(k-2)

d(k-1)

( ) ( ) ( ) ( ) ( )( )MkdkdNkxkxfky −−−= ,...,1,,...,

Page 115: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures

• Non-linear dynamic model structures based on regressor

– NOE

Multi-inputsingle output

T

D

L

T

D

L

Input

Outputstatic network

x(k-1)

x(k-N)

x(k)

x(k)

y(k)

y(k-M)y(k-2)

y(k-1)

( ) ( ) ( ) ( ) ( )( )MkykyNkxkxfky −−−= ,...,1,,...,

Page 116: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures• Non-linear dynamic model structures based on regressor

– NARMAX

– NJB

– NSS nonlinear state space representation

( ) ( ) ( ) ( ) ( ) ( ) ( )( )LkεkεMkdkdNkxkxfky −−−−−= ,...,1,,...,1,,...,

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( )KkεkεLkεkεMkykyNkxkxfky xx −−−−−−−= ,...,1,,...,1,,...,1,,...,

Page 117: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures

• Nonlinear function of regressor

– linear-in-the-parameter models (basis function models)

– nonlinear-in-the-parameter models

( ) ( )( )kfky ϕ,w=

( ) ( )( )kfwky jn

jj ϕ,)1(

1

)2( w∑==

[ ]Tnwww )1()2()2(2

)2(1 ,ww K=

( ) ( )( )kfwky jn

jj ϕ∑=

=1[ ]Tnwww K21=w

Page 118: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Basic neural network architectures

• Basis functions

– MLP (with single nonlinear hidden layer)

• sigmoidal basis function

– RBF (radial basis function, e.g. Gaussian)

– CMAC (rectangular basis functions, splines)

( )( )kf j ϕ

( )( ) ))(( )1(0

)1()1(jj

Tj wksgmkf +ϕ=ϕ, ww

Ksessgm −+

=1

1)(

( ) ( )( )kfwky jn

jj ϕ,)1(

1

)2( w∑==

( ) ( )∑ −=∑=j

jjj

jj fwkfwky cϕϕ )()( ( ) [ ]22 2/exp iijf σϕϕ cc −−=−

Page 119: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• CMAC (rectangular basis functions)

Basic neural network architectures

quantization intervals

u1

u2

overlapping regions

regions ofone overlay

points of main diagonal

points ofsubdiagonal

Page 120: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• General basis functions of compact support (higher-order CMAC)

• B-splinesadvantages

Basic neural network architectures

A two-dimensional basis function with compact support: tensor product of a second-order B-spline

Page 121: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Capability of networks

Page 122: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Capability of networks

• Function approximation

• Classification

• Association

• Clustering

• Data compression

• Significant component selection

• Optimization

Supervised learning network

Unsupervised learning network

Page 123: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Capability of networks

• Approximation of functions– Main statements: some FF neural nets (MLP, RBF) are

universal approximators (in some sense)

– Kolmogorov’s Theorem (representation theory): any

continuous real-valued N-variable function defined on

[0,1]N can be represented using properly chosen functions

of one variable (non constructive).

)(=),...,,(1

2

0=21 ⎟⎟

⎞⎜⎜⎝

⎛∑∑=

N

pppq

N

qqN xxxxf ψφ

Page 124: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Capability of networks

• Approximation of function (MLP)– Arbitrary continuous function f : RN→R on a compact

subset K of RN can be approximated to any desired degree of accuracy (maximal error) if and only if the activation function, g(x) is non-constant, bounded, monoton increasing. (Hornik, Cybenko, Funahashi, Leshno, Kurkova, etc.)

∑ ∑ === =

M

i

N

jjijiN xxwgcxxf

1 001 1 ; )(),...,(ˆ

ε<−∈ ),...,(ˆ),...,(max 11 NN xxfxxfKx 0>ε

Page 125: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Capability of networks

• Approximation of function (MLP)– Arbitrary continuous function f : RN→R on a compact

subset of RN can be approximated to any desired

degree of accuracy (in the L2 sense) if and only if the

activation function is non-polynomial (Hornik, Cybenko,

Funahashi, Leshno, Kurkova, etc.)

∑ ∑ === =

M

i

N

jjijiN xxwgcxxf

1 001 1 , )(),...,(ˆ

Page 126: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Classification– Perceptron: linear separation

– MLP: universal classifier

Capability of networks

)( iff,)( jXxjf ∈=x ,,2,1: kKf K→

NRK ofsubset compact

jiXXXK

KkX

jjjk

j

j

≠=

=

= ifempty is and

of subsetsdisjoint ,1,j

)()()(

1

)(

IU

K

Page 127: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Capability of networks

• Universal approximator (RBF)An arbitrary continuous function f : RN→R on a compact subset K of RN can be approximated to any desired degree of accuracy in the following form

if g : RN→R is non-zero, continuous, integrable function.

-

= )(ˆ1=

∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛M

i i

ii gwf

σcx

x

Page 128: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Computational capability of the CMAC

• The approximation capability of the Albus binary CMAC

• Single-dimensional (univariate) case

• Multi-dimensional (multivariate) case

Page 129: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Computational capability of the CMAC

Σ

Space of possibleinput vectors

xx

i+

i+3

x a a y

y

wwwww

i

i+

i+

i+

2

1

3

C=4

waassociation vector weight vector (trainable)

x1

2

i+

i+

4

5x

x

2

3

xx

xx

j+

j+

1

2

3

j j

j+

w

wwww

j+

j+

j+

1

3

2

xi

x

x

x

i+1

i+

i+

4

5

Page 130: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

1 2 3 4 x

C=4

quantization intervalsregions of one overlay(supports of basis functions of one overlay)

overlays

Computational capability of the CMAC

• Arrangement of basis functions: uni-variate case

Number of basis functions: 1−+= CRM

Page 131: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Computational capability of the CMAC

• Arrangement of basis functions: multi-variate case

quantization intervals

u1

u2

overlapping regions

regions ofone overlay

points of main diagonal

points ofsubdiagonal

Number of basis functions

⎥⎥

⎤)1(

1

01⎢

⎡−+= ∏

=−

N -1

iiN CR

CM

C overlays

C=4

Page 132: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC approximation capability

Basis functions

C overlaysConsistency equations:

)()()()( dcba ffff −=−

can model only additive functions

∑=

==N

iiiN xfxxxff

121 )(),...,,()(x

Page 133: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC modeling capability

One-dimensional case: can learn any training data set exactly

Multi-dimensional case: can learn any training data set from the additive function set (consistency equations)

Page 134: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC generalization capability

Important parameters:

C generalization parameterdtrain distance between adjacent training data

Interesting behaviorC=l*dtrain : linear interpolation between the

training points

C≠l*dtrain : significant generalization error non-smooth output

Page 135: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC generalization error

=

Page 136: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC generalization error

Page 137: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC generalization error

Page 138: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Multidimensional caseCMAC generalization error

without withregularization

Page 139: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC generalization error univariate case (max)

1 2 3 4 5 6 7 80

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

C/dtrain

Abs. valueof max. rel. error

h

Page 140: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Application of networks (based on the capability)• Regression: function approximation

– modeling of static and dynamic systems, signal modeling, system identification

– filtering, control, etc.

• Pattern association– association

• autoassociation (similar input and output)

(dimension reduction, data compression)

• Heteroassociation (different input and output)

• Pattern recognition, clustering – classification

Page 141: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Application of networks (based on the capability)

• Optimization– optimization

• Data compression, dimension reduction– principal component analysis (PCA), linear

networks

– nonlinear PCA, non-linear networks

– signal separation, BSS, independent component analysis (ICA).

Page 142: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Data compression, PCA networks

• Karhunen-Loève tranformationxy Φ= [ ]TNϕϕϕ=Φ ..., ,, 21

1 ,further , −=→== ΦΦΦΦϕϕ TTijj

Ti Iδ

∑=

=N

iiiy

1ϕx NMy

M

iii ≤= ∑

=

,ˆ1

ϕx

( ) ∑∑ ∑+== =

=⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−=−=N

Mii

N

i

M

iiiii yyyε

1

22

1 1

22 EEˆE ϕϕxx

( ) ( )[ ]∑∑+=+=

−−=−−=N

Mii

Tiii

Ti

N

Mii

Tiiεε

11

2 11ˆ ϕϕϕϕϕϕ λλ xxC

[ ]∑+=

=−=∂

N

Miiii

i

λε1

22ˆ

0Cxx ϕϕϕ∂

iii λ ϕϕ =xxC ∑ ∑ ∑1+=ι += +=

=ϕϕ=ϕϕ=N

Μ

N

Mi

N

Miiii

Tii

Ti

1 1

2 λλε xxC

TE xxCxx =

Page 143: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Data compression, PCA networks

• Principal component analysis (Karhunen-Loève tranformation

x2y2 y1

x1

xy Φ=

Page 144: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Nonlinear data compression

• Non-linear problem (curvilinear component analysis)

x1

y1

x2

Page 145: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

ICA networks

• Such linear transformation is looked for that restores the original components from mixed observations

• Many different approaches have been developed

depending on the definition of independence

(entropy, mutual information, Kullback-Leibleir information,non-Gaussianity)

• The weights can be obtained using nonlinear network (during training)

• Nonlinear version of the Oja rule

Page 146: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

The task of independent component analysis

Pictures taken from: Aapo Hyvärinan: Survey of Independent Component Analysis

Page 147: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsBrown, M. - Harris, C.J. and Parks, P "The Interpolation Capability of the Binary CMAC", Neural Networks, Vol. 6, pp. 429-440, 1993

Brown, M. and Harris, C.J. "Neurofuzzy Adaptive Modeling and Control" Prentice Hall, New York, 1994.

Hassoun, M. H.: "Fundamentals of Artificial Neural Networks", MIT Press, Cambridge, MA. 1995.

Haykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N. J.1999.

Hertz, J. - Krogh, A. - Palmer, R. G. "Introduction to the Theory of Neural Computation", Addison-Wesley Publishing Co. 1991.

Horváth, G. "CMAC: Reconsidering an Old Neural Network" Proc. of the Intelligent Control Systems and Signal Processing, ICONS 2003, Faro, Portugal. pp. 173-178, 2003.

Horváth, G. "Kernel CMAC with Improved Capability" Proc. of the International Joint Conference on Neural Networks, IJCNN’2004, Budapest, Hungary. 2004.

Lane, S.H. - Handelman, D.A. and Gelfand, J.J "Theory and Development of Higher-Order CMAC Neural Networks", IEEE Control Systems, Vol. Apr. pp. 23-30, 1992.

Miller, T.W. III. Glanz, F.H. and Kraft, L.G. "CMAC: An Associative Neural Network Alternative to Backpropagation" Proceedings of the IEEE, Vol. 78, pp. 1561-1567, 1990

Szabó, T. and Horváth, G. "Improving the Generalization Capability of the Binary CMAC” Proc. of the International Joint Conference on Neural Networks, IJCNN’2000. Como, Italy, Vol. 3, pp. 85-90, 2000.

Page 148: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Learning

Page 149: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Learning in neural networks

• Learning: parameter estimation

– supervised learning, learning with a teacher

x, y, d training set:

– unsupervised learning, learning without a teacher

x, y

– analytical learning

Piii 1, =dx

Page 150: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Supervised learning

• Model parameter estimation: x, y, dn

d

y

C=C(ε)Criterion

C(d,y)

Neural model

Parameter

algorithm

System

function

adjustment

x

d=f (x,n)

y=fM (x,w)

Page 151: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Supervised learning

• Criterion function– quadratic criterion function:

– other criterion functions• e.g. ε insensitive

– regularized criterion functions:

adding a penalty (regularization) term

( ) ( ) ( ) ( )⎭⎬⎫

⎩⎨⎧∑ −=−−=j

jjT ydCC 2EE)(= ydydεyd,

( ) RλCCC +)(= εyd,εε

C(ε)

Page 152: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Supervised learning

• Criterion minimization

• Analytical solutiononly in linear-in-the parameter casese.g. linear networks: Wiener-Hopf equation

• Iterative solution– gradient methods

– search methods• exhaustive search

• random search

• genetic search

( ))(minargˆ wyd,ww

C=

Page 153: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Supervised learning

• Error correction rules– perceptron rule

– gradient methods

• steepest descent

• Newton

• Levenberg-Marquardt

• conjugate gradient

( ))()()1( kkk ∇−+=+ Qww μ

IQ =1−= RQ

( ) ( ) ( ) ( ).)()(1 1 kCkkk wwHww ∇−=+ − ( ) ( ) Ωλ∇∇ +≅ Tyy wwH E

)()()()1( kkεkk xww μ+=+

kkkk gww α+=+ )()1( kjkTj ≠= if0Rgg

Page 154: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Perceptron training

=1

x1

x2

xN

0

y=sgn(s)

w0

s= wTxw1

w2

wN

x

Σ

)()()()1( kkεkk xww μ+=+

Converges in finite number of training steps if we have a linearly separable two-class problem with finite number of samples with a finite upper bound μ>0M≤x

Page 155: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gradient method

• Analytical solution– linear-in-the parameter model

– quadratic criterion function

– Wiener-Hopf equation

( ) ( ).)( kkky T xw=

( ) ( ) ( )( )

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )kkkkd

kkkkkkkdkd

kkkdkC

TT

TTT

T

Rwwwp

wxxwwx

xw

+−=

+−=

⎭⎬⎫

⎩⎨⎧ −=

2E

EE2)(E

E)(

2

2

2

.1pRw −∗ = TxxR E= yxp E=

Page 156: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gradient method

• Iterative solution

– gradient

– condition of convergence

( ).)()()1( kkk ∇−+=+ μww

( ) ( )( ) ( ) )(2 ∗−==∇ wwR

wk

kkCk

∂∂

max

10λ

μ << Rof eigenvalue maximal:maxλ

Page 157: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gradient method• LMS: iterative solution based on instantaneous error

– instantaneous gradient

– weight updating

– condition of convergence

( ) ( )( ) ( )k

kkkkCk

ww ∂∂

==∇)()(2

ˆˆ εε∂∂

max

10λ

μ <<

( ) ( ) ( ) ( ) )()(ˆ 2 kkCkkkdk T εε =−= wx

( ) ( ) ( )( ) ( ) ( )kkkkk xwww μεμ 2ˆ1 +=−+=+ ∇ ( )k

Page 158: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gradient methods• Example of convergence

a.) small μ b.) large μ c.) conjugate gradientsteepest descent

0

ww

w1

w

(0)*

(1)w

(a)(b)

(c)

Page 159: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gradient methods

• Single neuron with nonlinear activation function

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )( )kkkdkskdkykdk T xwsgmsgm −=−=−=ε

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )kkδkμkkkskεkμkk xwxww 2msg21 +=′+=+

Parameter adjustment

Page 160: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Gradient methods

• Multi-layer network: error backpropagation (BP)

( )( ) ( )( ) ( )( ) ( )( ) ( )( )( ) ( )( )

( )( ) ( )( ) ( )( )kkk

kkskwkkk

lli

li

lli

N

r

lri

lr

li

li

l

xw

xww

μδ

δμ

2

msg211

1

11

+=

′⎟⎟⎠

⎞⎜⎜⎝

⎛+=+ ∑

+

=

++

( )( ) ( )( ) ( )( ) ( )( )( )kskwkδkδ li

N

r

lri

lr

li

lmsg

1

1

11 ′⎟⎟⎠

⎞⎜⎜⎝

⎛∑=

+

=

++

l = layer indexi = processing element indexk = iteration index

Page 161: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

d

d

d

+

+

_

_

_

+

+

+

+y

1

2y

yn

ε

ε

ε2

2

1

1

n

n

x

x

Π Π

xx

x

x

y = x

x(1)2

updating updating

(2)

(2)

μ μ2x

x

(1)

(1) (2)

x =oo(2)

Σ

Σ

Σ

Σ

Σ

ΣW

W

(2)(1)

(1)

δδ (2) (2)(1)

W

W W

1 1x =

x

x

x

x

1

2

3

N

(1)

(1)

(1)

(1)

(1) f(.) f(.)

f(.)

f(.) f(.)

f(.)

f'(.) f'(.)

MLP training: BP

Page 162: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an MLP

• important questions– the size of the network (model order: number of

layers, number of hidden units)

– the value of the learning rate, μ– initial values of the parameters (weights)

– validation, cross validation learning and testing set selection

– the way of learning, batch or sequential

– stopping criteria

Page 163: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an MLP

• The size of the network: the number of hidden units (model order)– theoretical results: upper limits

• Practical approaches: two different strategies– from simple to complex

• adding new neurons

– from complex to simple• pruning

– regularization

– (OBD, OBS, etc)

Page 164: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an MLP

• Cross validation for model selection

C

Model complexity (Size of the network)

Test error

Training error

Best model

Bias (underfitting) Variance (overfitting)

Page 165: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an MLP

• Structure selection

C training error

Number of training cycles

Increasing the number of hidden units decreasing (a)

(b)

(c)(d)(e)

(f)

Page 166: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Output

Proper fitting to training points

Generalization

Training points Overfitting

Input

Designing of an MLP

• Generalization, overfitting

Page 167: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an MLP

• Early stopping for avoiding overfittingC

Number of training cycles

Test error if stopped at the optimum point

Test error if not stopped at the optimum point

Training error

Optimal stopping point

Page 168: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an MLP

• Regularization– parametric penalty

– nonparametric penalty

∑+=ji

ijr wCC,

)()( λww

)(sgn ijij

ij wwCw λμμ −⎟

⎟⎠

⎞⎜⎜⎝

∂∂

−=Δ

∑Θ≤

+=ijw

ijr wCC λ)()( ww

( )( ) smoothness of measure some is)(ˆ where

)(ˆ)()(

xf

xfCCr

Φ

Φλ+= ww

Page 169: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

MLP as linear data compressor(autoassociative network)

• Subspace transformation

Desired output = Input Input: x Output of the hidden layer: z

x

x

x

x W

Hidden layer Output layer M neuron (in learning phase) (Output of the compression) N neuron

N

1

2

3 x

W'

y

y

y

1

2

N

z

z

z

1

2

M

z

Page 170: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Nonlinear data compression (autoassociative network)

x

x

x

1

2

3

x

nonlinear linear nonlinear linear

layer

y

y

1

2

z1

2

Z compressed output

f

f

f

f

f

f

f

f

l

ll

l

l yzx 3

Page 171: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

( ) ( ) xwcxx T

iii

iii gwgwy =∑ −=∑= ( ) [ ]22 2/exp iii σg cxx −−=

RBF (Radial Basis Function)σ

g =

x

x1

2

N

y

+1

w

c

c

c

Σ

1,

2

M M

input layer hidden layer output layer

x

ϕ =0

g =

g =

2

1

1 1

2 2

M M

( )

( )

( )

x

x

x

x ϕ

ϕ

ϕ

Page 172: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

RBF training

• Linear-in-the parameter structure– analytical solution

– LMS

• Cetres (nonlinear-in-the-parameters)– K-means

– clustering

– unsupervised learning

Page 173: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Designing of an RBF

• Important questions– the size of the network (number of hidden units)

(model order)

– the value of learning rate, μ– initial values of parameters (centres, weights)

– validation, learning and testing set selection

– the way of learning, batch or sequential

– stopping criteria

Page 174: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC network

• Network with one trainable layer

Σ

Space of possibleinput vectors

xx

i+

i+3

x a a y

y

wwwww

i

i+

i+

i+

2

1

3

C=4

waassociation vector weight vector (trainable)

x1

2

i+

i+

4

5x

x

2

3

xx

xx

j+

j+

1

2

3

j j

j+

w

wwww

j+

j+

j+

1

3

2

xi

x

x

x

i+1

i+

i+

4

5

Page 175: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

CMAC network

• Network with hash-coding

a wzassociation compressed weight vectorvector association vector

x =

Σ

xx

==

Input space

x

xxxi+

1

===

111

00

x =x =M

M

0

00

z

y

y

M

-1-2

12

i+

i+

2

C=4

xi+3=1z

i+

i+

i+z

w

w

w

w

i+

i+

i+

z

z

2

3 3

2

1 1

ii i

x a a z

Page 176: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Analytical solution

y(ui)=a(ui)Tw i=1, 2, ... , P

dAw †=∗ ( ) 1† −= TT AAAA

Awy =

CMAC modeling capability

for univariate cases: M ≥ P for multivariate cases: M < P

• Iterative algorithm (LMS)

Page 177: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Networks with unsupervised learning

• Selforganizing network– Hebbian rule

– Competitive learning

– Main task

• clustering, detection of similarities

(normalized Hebbian + competitive)

• data compression (PCA, KLT) (nomalized Hebbian)

Page 178: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Unsupervised learning

• Hebbian learning

• Competitive learning

xyηw =Δ wyx

Output

Input

x

y

y

y

WxN

1

x 1

i*2

M

iTi

Ti ∀≥∗ xwxw

( )∗∗ −=Δ ii wxw μ

Normalized Hebbian rule

Page 179: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

PCA network

• Oja rule

( ) ( )wxwxΔw 2yyyy −=−= μμ

( ) ( )( ) ( ) ( ) 11~1~

1~1~

1 −++=++

=+ kkkkk ww

www

( ) ( ) ( ) ( ) ( ) ( )( ) ( )[ ] ( )222

222

2

~21~

μμ

μμ

Okyk

Okykkkk T

++=

++=+

w

xwww

( ) ( )( ) ( ) ( )222/121 11~1~ μμ Okykk +−=+=+−− ww

( ) ( ) ( ) ( )[ ] ( ) ( )[ ]( ) ( ) ( ) ( ) ( )[ ]kykkkyk

Okykykkkwxw

xww−+≅

+−+=+μ

μμμ

11 22

=w xTyOutput

Inputx

x

x

x

Σy

1

2

3

Nw

( ) ( ) ykk xww μ+=+ 1~

It can be proofed: w converges to the largest eigenvector

Page 180: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

PCA network

• Oja rule as a maximum problem (gradient search)

( ) ww

Rwwww

w T

T

T

yf ==2E

( ) ( ) wRwwRww 22 Tf −=∇

( ) wx

wwxxwwxxw

2E2E2

E2E2

yy

TTTf

−=

−=∇

( ) )(222 2 yyyyf wxwxw −=−=∇

Output

Inputx

x

x

x

=w xT

Σy

y1

2

3

Nw

Solution: gradient method with instantenous gradient

( ) ( ) ( ) ( ) ( ) ( )[ ]kykkkykk wxww −+=+ μ1

Page 181: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

PCA networks

• GHA network (Sanger network)( )1

21

)1(11 wxw yy −= μΔ

( ) 11)1(

1)1(

1)1()2( wxwxwxx yT −=−=

( )( )2

22121

)1(2

222

)2(22

wwx

wxw

yyyy

yy

−−

=−=Δ

μ

μ

( )( )

=

...=

21

1

)1(

221

)1(

2)1(

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

−−−

−=Δ

∑−

=ii

i

jijii

iiii

iiii

yyyy

yyyy

yy

wwx

wwx

wxw

μ

μ

μ

( )[ ]WyyyxW TT LT−= ηΔ

Output

Inputx

x

x

x

y

y

y

WN

1

12

23

M

Oja rule + Gram-Schmidt orthogolarization

Page 182: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

PCA networks

• Oja rule for multi-output (subspace problem)

Output

Inputx

x

x

x

y

y

y

WN

1

12

23

M

( )[ ]WyyxyW TT −= ηΔ

Output variance maximization rule

Page 183: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

PCA networks

Output

Inputx

x

x

x

y

y

W Q

N

1

2

23

1

1

y

y

j-

j

Output

Inputx

x

x

x

y

y

y

WN

1

12

23

M

GHA (Sanger) network Apex network

Page 184: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

ICA networks

• Such linear transformation is looked for that restores the original components from mixed observations

• Many different approaches have been developed

depending on the definition of independence

(entropy, mutual information, Kullback-Leibleir information,non-Gaussianity)

• The weights can be obtained using nonlinear network (during training)

• Nonlinear version of the Oja rule

Page 185: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

ICA training rule (one of the possible methods)

( ) ( ) ( ) ( ) ( ) ( )( )[ ] ( )( )kkkkkkk TygygWvWW −+=+ η1

( ) ( ) ( ) ( ) ( )kkskkk i

M

ii nanAsx +=+= ∑

=1

( ) ( ) ∑∑==

−==M

iii

M

ii yyyC

1

224

1

4 E3Ecumy

( ) ( ) ( )kkk sBxy ˆ==

( ) ( ) ( ) ( ) ( ) ( )kkkkkk T VIvvVV ][1 −−=+ μ

( ) ( ) ( )kkk VWB =

( ) ( )kk Vxv = ( ) ( ) Ivv =TkkE

First step: whitening

Second step: separation

g(t)=tanh(t)

Page 186: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Networks with unsupervised learning

• clustering

• detection of similarities

• data compression (PCA, KLT)

• Independent component analysis (ICA)

Page 187: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Networks with unsupervised learning

• Kohonen network: clustering

OutputInput

x

x

x FF weights

y

y

y

x

Lateral connections

3

N

2

2

1

3

w

w

1

3

1

2

3

1

Page 188: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Independent component analysis

( ) ( ) ( ) ( ) ( )kkskkk i

M

ii nanAsx +=+= ∑

=1

( ) ( ) ( )kkk sBxy ˆ==

RestoredInput Whitened

x

x

x

V

1

L

x

W

y1

v yv

v

v

1

2

L

2 y

yM

2

complex signal complex signal signal

ICA network architecture

Page 189: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsDiamantaras, K. I .- Kung, S. Y. "Principal Component Neural Networks Theory and Applications", John Wiley and Sons, New York. 1996.

Girolami, M - Fyfe, C. "Higher Order Cumulant Maximisation Using Nonlinear Hebbian and Anti-Hebbian Learning for Adaptive Blind Separation of Source Signals", Proc. of the IEEE/IEE International Workshop on Signal and Image Processing, IWSIP-96, Advances in Computational Intelligence, Elsevier publishing, pp. 141 - 144, 1996.

Hassoun, M. H.: "Fundamentals of Artificial Neural Networks", MIT Press, Cambridge, MA. 1995.

Haykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N. J.1999. Hertz, J. - Krogh, A. - Palmer, R. G. "Introduction to the Theory of Neural Computation", Addison-Wesley Publishing Co. 1991.

Karhunen, J. - Joutsensalo, J. "Representation and Separation of Signals using Nonlinear PCA Type Learning", Neural Networks, Vol. 7. No. 1. 1994. pp. 113-127.

Karhunen, J. - Hyvärinen, A. - Vigario, R. - Hurri, J. - and Oja, E. "Applications of Neural Blind Source Separation to Signal and Image Processing", Proceedings of the IEEE 1997 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97), April 21 - 24. 1997. Munich, Germany, pp. 131-134.

Kung, S. Y. - Diamantaras, C.I. "A Neural Network Learning Algorithm for Adaptive Principal Component Extraction (APEX)", International Confer-ence on Acoustics, Speech and Signal Processing, Vol. 2. 1990. pp. 861-864.

Narendra, K. S. – Parthasaraty, K. „Gradient Methods for the Optimization of Dynamical Systems Containing Neural Networks” IEEE Trans. on Neural networks, vol. 2. No. 2. pp. 252-262. 1991.

Sanger, T. "Optimal Unsupervised Learning in a Single-layer Linear Feedforward Neural Network", Neural Networks, Vol. 2. No. 6. 1989. pp. 459-473.

Widrow, B. - Stearns, S. D. "Adaptive Signal Processing", Prentice-Hall, Englewood Cliffs, N. J. 1985.

Page 190: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic neural architectures

Page 191: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic neural structures

• Feed-forward networks– NFIR: FIR-MLP, FIR-RBF,etc.– NARX

• Feedback networks– RTRL– BPTT

• Main differences from static networks– time dependence (for all dynamic networks)– feedback (for feedback networks: NOE, NARMAX,

etc.)– training: not for single sample pairs, but for sample

sequences (sequantial networks)

Page 192: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

• NFIR: FIR-MLP (winner of the Santa Fe competition for laser signal prediction)

y =x 1 1l- l( 1) ( )

( +1)

s

z

FIR filter

(.)(.)

(.)

(.)

(.)

(.)

Σ

Σ

Σ Σ

Σ

Σ

-th layer

( )

( )1 y

y = x

1

( )

( )

2 221

l

.

.

.

ll

ll l

f f

f

f f

f

w l( )21

Page 193: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Feed-forward architecture

FIR-MLP training: temporal backpropagation

– output layer

– hidden layer( ) ( ) ( ) ( )MkMkkk jiijij −−+=+ xww μδ21

( ) ( )( ) ( ) mim

Tmii MkksfMk w∑ −′=− Δδ

( )( ) ( )( ) ( )( ) ( ) ( )( )[ ]lm

lm

lm

lm

lm Mkkkk ++= δδδ ,...,1,Δ

( )( )( )∑=

kl

ijl

ij

kww ∂

∂ε∂∂ε 22

( ) ( )( )( )( )

( )lij

li

kl

il

ij

ksks ww ∂

∂∂

ε∂∂

ε∂ ∑=22

( )( ) ( )( )

( )( )( )lij

li

li

lij

ksks

kww ∂

∂∂

ε∂∂ε∂ 22

( )( ) ( )( ) ( )( )( ) ( )( )kksfkk Lj

Lii

Lij

Lij xww '21 εμ+=+

( )kK

k∑

=

=1

22 εε

Page 194: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Recurrent network

• Architecture

Outputx k

x k

Inputx k

y k

y k

y k

y k+

y k+

y k+1

1

1

M

M

2

2

2

N

z

z

-1

-1

zz

-1

-1

Wfeed-forward(static) network

( 1)

( 1)

( 1)( )( )

( )

( )

( )

( )

Page 195: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Recurrent network• Training: real-time recursive learning (RTRL)

( ) ( ) ( ) ( )( ) ( ) ( )( ) ( )∑

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎥⎦

⎢⎢⎣

⎡∑ +′+=+

∈ ∈C Bl rjli

ij

rlrllijij kuδ

kwkykwksfkεμkwkw

∂∂21

( ) ( )( )⎩

⎨⎧

∈∈= B

Aikyikxku

ii

i ifif ( ) ( ) ( )( ) ( )

⎩⎨⎧ ∈−=

otherwise 0 if 22 kikykdkε iii

C

( )( )

( )( ) ;

22

∑∈

=Cl ij

l

ij kwk

kwk

∂ε∂

∂ε∂ ( ) ( )

( );2

kwkkw

ijij ∂

ε∂μ−=Δ( )( ) ( ) ( )

( )( ) ( )

( )kwkyk

kwkk

kwk

ij

l

ll

ij

l

ll

ij

∂∂ε

∂ε∂ε

∂ε∂

∈=

C

C

2-=

22

( )( ) ( )( ) ( ) ( )

( ) ( ) 1⎥⎥⎦

⎢⎢⎣

⎡+′=

+ ∑∈Br

jliij

rlrl

ij

l kukwkykwksf

kwky δ

∂∂

∂∂

Page 196: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Recurrent network

• Training: backpropagation through time (BPTT)unfolding in time

a.) b.)

PE

PE

1

2

w

w w

w22

1221

11

PE1

PE2

x(k)

y(k)

k k kk =1 =2 =3 =4

w w

w w w

w w w

w w w

11 11

22 22

21 21

12 12

w

22

12

21

11

4

PE PE PE PE11

1 1 12 3 4

PE PE PE PE2 2 2 221 3 4PE2

4

x(1) x(2) x(3)

y(1) y(2) y(3) y(4)

Page 197: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic neural structures

• Combined linear dynamic and non-linear dynamic architectures

feed-forward architectures

( ) ( ) ( )ijijij w

zHw

kw

k∂∂

=∂

∂=

∂∂ vyε ( ) ( ) ( )∑

∂∂

∂=

∂=

l ij

l

lijij wv

vky

wky

wkε

)2()2()2(

Dynamic backpropagation

Page 198: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic neural structures

( ) ( ) ( ) ( ) ( ) ( ) 21

⎟⎟⎠

⎞⎜⎜⎝

∂∂

+∂

∂∂

∂=

∂∂

=∂∂

ijijijij wkyzH

wuN

vvN

wky

wkε

a.) feedback arhitectures b.)

( ) ( ) ( ) ( ) ( ) ( )ijijijij w

Nw

kzHNw

kw

k∂

∂+

∂∂

∂∂

=∂

∂=

∂∂ vy

vvyεa.)

b.)

Page 199: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic system modeling

• Example: modeling of a discrete time system

– where

– Training signal: uniform, random, two different amplitudes

( ) ( ) ( ) ( )[ ]kufkykyky +−+=+ 16.03.01

( ) uuuuf 4.03.0 23 −+=

Page 200: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic system modeling

• The role of excitation: small excitation signal

0 500 1000 1500 2000-20246

Mod

el o

utpu

t

0 500 1000 1500 2000-20246

Plan

t out

put

0 500 1000 1500 2000-2024

Erro

r

Page 201: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Dynamic modeling

• The role of excitation: large excitation signal

0 500 1000 1500 2000-20246

Mod

el o

utpu

t

0 500 1000 1500 2000-20246

Plan

t out

put

0 500 1000 1500 2000-2024

Erro

r

Page 202: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readings

Hassoun, M. H.: "Fundamentals of Artificial Neural Networks", MIT Press, Cambridge, MA. 1995.

Haykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N. J.1999.

Hertz, J. - Krogh, A. - Palmer, R. G. "Introduction to the Theory of Neural Computation", Addison-Wesley Publishing Co. 1991.

Widrow, B. - Stearns, S. D. "Adaptive Signal Processing", Prentice-Hall, Englewood Cliffs, N. J. 1985.

Narendra, K. S. - Pathasarathy, K. "Identification and Control of Dynamical Systems Using Neural Networks," IEEE Trans. Neural Networks, Vol. 1. 1990. pp. 4-27.

Narendra, K. S. - Pathasarathy, K. "Identification and Control of Dynamic Systems Using Neural Networks", IEEE Trans. on Neural Networks, Vol. 2. 1991. pp. 252-262.

Wan, E. A. "Temporal Backpropagation for FIR Neural Networks", Proc. of the 1990 IJCNN, Vol. I. pp. 575-580.

Weigend, A. S. - Gershenfeld, N. A. "Forecasting the Future and Under-standing the Past" Vol.15. Santa Fe Institute Studies in the Science of Complexity, Reading, MA. Addison-Wesley, 1994.

Williams, R. J. - Zipser, D. "A Learning Algorithm for Continually Running Fully Recurrent Neural Networks", Neural Computation, Vol. 1. 1989. pp. 270-280.

Page 203: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

Page 204: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Outline

• Why we need a new approch

• Support vector machines– SVM for classification

– SVM for regression

– Other kernel machines

• Statistical learning theory– Validation (measure of quality: risk)

– Vapnik-Chervonenkis dimension

– Generalization

Page 205: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• A new approach:

gives answers for questions not solved using the classical approach – the size of the network

– the generalization capability

Page 206: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Classical neural learning(perceptron)

Support Vector Machine

Support vector machines

Optimal hyperplane

• Classification Margin

Page 207: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Linearly separable two-class problem

separating hyperpalne

Piii y 1),( =x 1,1 21 −=∈+=∈ iiii yy XX xx

0=+ bT xw

2

1

if,1

and if,1

Xb

Xb

iiT

iiT

∈−≤+

∈+≥+

xxw

xxw

iyb iiT ∀≥+ ,1)( xw Optimal

hyperplane

Page 208: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

ww

xw

w

xw

xwxww

xx

xx

2minmin

),,(min),,(min),(

1;1;

1;1;

=+

++

=

=+=

−==

−==

bb

bdbdb

iT

y

iT

y

iy

iy

iiii

iiiiρ

w

xwxw

bbd

T +=),,( x

d(x)wbd =

x1

x2• Geometric interpretation

The formulation of optimality

Page 209: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Criterion function (primal problem)

with the conditions

a constrained optimization problem

(KKT conditions, saddle point)

conditions

( ) 2

21 ww =Φ

∑ −+−==

P

iii

Ti ybbJ

1

2 1][21),,( xwww αα

01

=∑−==

iiP

ii yJ xw

∂∂

∑==

P

iiii y

1xw α0

1=∑=

=i

P

ii yb

J α∂∂

01

∑ ==

P

iii yα

),,(minmax,

αα

bJb

ww

iyb iiT ∀≥+ ,1)( xw

marginmaxmin 2 →w

Page 210: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Lagrange function (dual problem)

support vectors optimal hyperplane

output

)(21max)(max

11 1∑+∑ ∑−=== =

P

ii

P

i

P

jjijiji yyW αααα

ααxx

0: >ii αx ∑=

=∗ P

iiii y

1xw α

( ) byby Tii

P

ii

T+=+∗= ∑

=

xxxwx1

)( α

01

∑ ==

P

iii yα allfor 0 ii ≥α

Page 211: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Linearly nonseparable case(slightly nonlinear case)

separating hyperplane

criterion function (slack variable ξ )

Lagrange function

support vectors optimal hyperplane

Piby iiT

i ,...,11][ =−≥+ ξxw

∑+==

P

iiCw

1

2

21),( ξξ wΦ

∑−∑ +−+−∑+===

P

iiiii

Tii

P

ii byCbJ

11

2 1][21),,,,( ξβξαξβαξ xwww

Ci ≤≤ α0

0: >ii αx ∑=

=∗ P

iiii y

1xw α

Optimal hyperplane

margin

ξ

Page 212: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Nonlinear separation, feature space

– separating hypersurface (hyperplane in the φ space)

– decision surface

– kernel function (Mercer conditions)

– criterion function

( ) 0=+ bT xw ϕ

( ) ( ) 0),(1 01

=∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛∑=∑

= ==

P

i

M

jjijiii

P

iii yKy xxxx ϕϕαα

( ) ( )jiT

jiK xxxx ϕϕ=),(

( ) 00

=∑=

M

jjjw xϕ

( ) ( )∑ ∑ ∑−== = =

P

i

P

i

P

jjijijii KyyW

1 1 121 xxαααα

Page 213: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

[ ]TMwww ,...,, 10=w ( ) ( ) ( )[ ]TM xxx ϕϕϕ ,...,, 10=ϕ

( ) ( )xwx Tϕ== ∑=

j

M

jjw ϕ

0

y

separating plane

Feature spaceinput (x) space (higher dimenzional)

feature (φ) spaceseparating curve

Page 214: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Kernel space

( ) ( ) ( )( ) ( ) bKybyby ii

P

iiii

P

ii

T+=+=+∗= ∑∑

==

xxxxxwx11

)( αα ϕϕϕ Τ

( ) ( )jiT

jiK xxxx ϕϕ=),(

• Kernel trickfeature representation (nonlinear transformation) is not usedkernel function values (scalar values are used)

separating curve separating plane

separating line

input space feature space kernel space

Page 215: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Examples of kernel functions

– Polynomial

– RBF

– MLP (only for certain βo and β1)

– CMAC B-spline

( ) ,...1,)1(, =+= dK di

Ti xxxx

( ) ⎟⎠⎞

⎜⎝⎛ −−= 2

221exp, iiK xxxxσ

( ) ( )10tanh, ββ += iT

iK xxxx

Page 216: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Example: polynomial basis and kernel function– basis functions

– kernel function

( ) 221122

222121

21

21 2221, iiiiiii xxxxxxxxxxxxK +++++=xx

( ) Tiiiiiii xxxxxx ]2,2,,2,,1[ 21

2221

21=xϕ

Page 217: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVR (regression)

⎪⎩

⎪⎨⎧

−−≤−

=−=otherwise),(

),( ha0),()),(,(

εαεα

ααεε x

xxx

fyfy

fyfyC

εε

C(ε)

ξ

ε-insensitive loss(criterion ) function

Page 218: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVR (regression)

( )

( )Pi

εy

εy

i

i

iiiT

iiT

i

1,2,...,

,0ξ

,0ξ

=

≥′

′+≤−

+≤−

xw

xw

ϕ

ϕ ( ) ( )⎟⎟⎠

⎞⎜⎜⎝

⎛′++=′ ∑

=

P

i

T C1

ii ξξ21ξξ,, wwwΦ

Constraints: Minimize:

( )∑=

=M

jjjwf

0)( xx ϕ

Page 219: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVR (regression)

Lagrange function

( ) [ ]

[ ] )()

)21)(,,,,,,

11

11

ξγξγξεα

ξεαξξγγααξξ

′′+−′++ϕ−′−

−++−ϕ−+′+=′′′

∑∑

∑∑

==

==

ii

P

iiii

Ti

P

ii

iiiT

P

ii

Ti

P

ii

y

yCJ

xw

xwwww

(

(

ii C αγ −= ii C αγ ′−=′

Page 220: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Dual problem

SVR (regression)

( ) ( ) ( ) ( )( ) ( )jiP

i

P

jjjii

P

iii

P

iiiiii KyW xx ,

21

1 111∑ ∑ ′−′−−∑ ′+−∑ ′−=′= ===

ααααααεαααα

constraints support vectors

( ) 01

=∑ ′−=

P

iii αα ,0 Ci ≤≤α ,0 Ci ≤′≤α iii αα ′≠:x

( ) ( )i

P

iii xw ϕ∑

=

′−=∗1

αα

( ) ( ) ( ) ( )( ) ( ) ( )xxxxxwx i

P

iiii

P

iii

TKy ∑∑

==

′−=′−=∗=11

)( αααα ϕϕϕ Τ

solution

Page 221: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVR (regression example)

ε=0

Page 222: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVR (regression example)

ε=0,1

Page 223: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVR (regression)

ε=0,1

Page 224: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Support vector machines

• Main advantages– automatic model complexity (network size)

– relevant training data point selection

– allows tolerance (ε)

– high-dimensional feature space representation is not used directly (kernel trick)

– upper limit of the generalization error (see soon)

• Main difficulties– quadratic programming to solve dual problem

– hyperparameter (C, ε, σ ) selection

– batch processing (there are on-line versions too)

Page 225: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVM versions• Classical Vapnik’s SVM (drawbacks)

• LS-SVM

classification regression

equality constraints

no quadratic optimization to be solved : a linear set of equations

• Ridge regressionsimilar to LS-SVM

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+=′Φ ∑

=

P

ii

T eC1

221ξξ,, www

Pieby iiT

i ,...,11][ =−=+xw

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+=′Φ ∑

=

P

ii

T eC1

221ξξ,, www

( ) Pieby iiT

i ,...,1=++ϕ= xw

Page 226: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

LS-SVM

Lagrange equation

The results

( ) ∑∑==

−++−+=P

kkkk

Tk

P

kk

T yebebL11

2

21

21);,,( xwwwαew ϕαγ

( ) PkyebL

PkeeLbL

L

kkkT

k

kkk

P

kk

P

kkk

,...,100

,...,1 0

00

)(

1

1

==−++→=∂∂

==→=∂∂

=→=∂∂

=→=∂∂

=

=

xw

xw0w

ϕ

ϕ

α

γα

α

α

Page 227: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

LS-SVMLinear equation

Regression Clasification

where

the response of the network

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎥⎥⎦

⎢⎢⎣

+ − yαIΩ11 00

1

bT

γr

r

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

+ − 1αIΩyy

r00

1

bT

γ

),(, jijiji Kyy xxΩ =( ) ( )jiT

jiji K xxxxΩ ϕϕ== ),(,

( )∑ =+= N

k kk bKαy 1 ,)( xxx ( )∑ =+= N

k kkk bKyαy 1 ,)( xxx

Page 228: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Main features of LS-SVM and ridge regression• Benefits

– Easy to solve (no quadratic programming , only a linear equation set)

– On-line, adaptive version (important in system identification)

• Drawbacks– Not sprase solution, all training points are used

(there are no „support vectors”)– No „tolerance parameter” (ε)– No proved upper limit of the generalization error– Large kernel matrix if many training points are

available

Page 229: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Improved LS Kernel machines

• There are sparse versions of the LS-SVM– The training points are ranked and only the most

important ones are used (iterative solution)

– The kernel matrix can be reduced (a tolerance parameter is introduced again)

– Detailes: see the references

• Additional contraints can be used for special applications (see e.g. regularized kernel CMAC)

Page 230: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Goal– General goal:

to show that additional constraints can be used in the framework of LS-SVM

here: the additional constraint is a weight-smoothing term

– Special goal:

to show that kernel approach can be used for improving the modelling capability of the CMAC

Kernel CMAC (an example)

Page 231: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

General goal• Introducing new constraints• General LS-SVM problem

Criterion function: two termsweight minimization term + error term

Lagrange functioncriterion function+ Lagrange term

Extensionadding new constraint to the criterion function

Extended Lagrange functionnew criterion function (with the new constraint) + Lagrange term

Page 232: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Special goal: improving the capability of the CMAC

• Difficulties with multivariate CMAC:– too many basis functions (too large weight memory)

– poor modelling and generalization capability

• Improved generalization: regularization

• Improved modelling capability:– more basis functions:

difficulties with the implementation

kernel trick, kernel CMAC

• Improved modelling and generalization capability– regularized kernel CMAC

Page 233: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Regularized CMAC

• Regularized criterion function (weight smoothing)

( ) ( ) ( )( ) ( ) ( )2

2

21

⎟⎟

⎜⎜

⎛−+−= kw

Cky

kykykJ id

d

( ) ( ) ( ) ( ) ( )⎟⎟⎠

⎞⎜⎜⎝

⎛−++=+ kw

Ckykekkwkw i

dii

)(1 λμ

without withregularization

Page 234: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Kernel CMAC

• Classical Albus CMAC: analytical solution

• Kernel version

criterion function (LS)

constraint

Lagrangian

kT

dky aw=

dyAw †* = ( ) 1† −= TT AAAA d

TTTTy yAAAxawxax 1* )()()()( −==

Awy =d

( ) ∑=

+=P

kk

T eJ1

2

221,min γwwew

w

kkT

d eyk

+= aw

Pk ,...,1=

( )∑=

−+−=P

kdkk

Tk k

yeJL1

),(),,( awewαew α

Page 235: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Kernel CMAC (ridge regression)

• Using the derivatives the resulted equations

⎪⎪⎪⎪

⎪⎪⎪⎪

==−+→=∂

==→=∂

=→=∂

∂ ∑=

PkyeL

Pkee

L

L

kdkkT

k

kkk

P

kkk

,...,1 0)(0),,(

,...,1 0),,(

),,(1

xawαew

αew

aw0wαew

α

γα

α

dyαIK =⎥⎦

⎤⎢⎣

⎡+

γ1

TAAK =

αxKxxaxawxax )(),()()()(11

TP

ikkk

P

kk

TT Ky ==== ∑∑==

αα

Page 236: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Kernel CMAC with regularization

( ) ∑∑∑==

⎟⎟⎠

⎞⎜⎜⎝

⎛−++=

P

k ik

dP

kk

T iwCy

eJ k

1

2

1

2 )(222

1,min λγwweww

( )∑∑∑∑===

−+−⎟⎟⎠

⎞⎜⎜⎝

⎛−++=

P

kdkk

Tk

P

k ik

dP

kk

Tk

k yeiwCy

eL11

2

1

2 )(222

1),,( awwwew αλγα

( )

wawwaa

waawwew

)(2

)(2

)(22

1),,(

111

211

2

k

P

k

Tk

TP

k

kP

k

d

P

kdkk

Tk

P

kk

T

diagdiagCd

Cy

yediageL

kk

kk

∑∑∑

∑∑

===

==

+−+

−+−+=

λλλ

αγα

Extended criterion function:

Lagrange function

Output( ) ⎥⎦

⎤⎢⎣⎡ ++= −

dTT

Cy yαADIxax λλ 1)()( ∑

=

=P

kkdiag

1)(aDwhere

Page 237: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Kernel function for two-dimensional kernel CMACKernel CMAC with regularization

Page 238: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Regularized Kernel CMAC ( example)• 2D sinc

without with regularization

Page 239: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsHaykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N. J.1999.

Saunders, C.- Gammerman, A. and Vovk, V. "Ridge Regression Learning Algorithm in Dual Variables. Machine Learning", Proc. of the Fifteenth International Conference on Machine Learning, pp. 515-521, 1998.

Schölkopf, B. and Smola, P. "Learning with Kernels. Support Vector Machines, Regularization, Optimization and Beyond" MIT Press, Cambridge, MA, 2002.

Suykens, J.A.K., Van Gestel, T, De Brabanter, J., De Moor, B. and Vandewalle, J. "Least Squares Support Vector Machines", World Scientific, Singapore, 2002.

Vapnik, V. "Statistical Learning Theory", Wiley, New York, 1995.

Horváth, G. "CMAC: Reconsidering an Old Neural Network" Proc. of the Intelligent Control Systems and Signal Processing, ICONS 2003, Faro, Portugal. pp. 173-178, 2003.

Horváth, G. "Kernel CMAC with Improved Capability" Proc. of the International Joint Conference on Neural Networks, IJCNN’2004, Budapest, Hungary. 2004.

Lane, S.H. - Handelman, D.A. and Gelfand, J.J "Theory and Development of Higher-Order CMAC Neural Networks", IEEE Control Systems, Vol. Apr. pp. 23-30, 1992.

Szabó, T. and Horváth, G. "Improving the Generalization Capability of the Binary CMAC” Proc. of the International Joint Conference on Neural Networks, IJCNN’2000. Como, Italy, Vol. 3, pp. 85-90, 2000.

Page 240: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Statistical learning theory

• Main question: how can the quality of a learning machine be estimated

• Generalization measure based on the empirical risk (error).

• Empirical risk: the error determined in the training points

Page 241: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Statistical learning theory

• Goal: to find a solution that minimizes the risk

• Difficulties: joint density function is unknown

Only the empirical risk can be determined

optimal value

minimizing the empirical risk

( ) [ ] ( ) dydypfydydyplR xxwxxxwxw ,),(,),()( 2∫∫ −==

[ ]∑=

−=P

lli fy

PR

1

2emp ),(1)( wxw )( *

emp PR w

)( * PR w

Page 242: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Statistical learning theory: ERM principle

• Asymptotic consistency of empirical risk

P

min R(w)=R(w º)w

Expected Risk R(w*|P)

Empirical Risk R(w*|P)

)( * PR w

)( *emp PR w →

→∞→PR when )( ow

∞→PR when )( ow

Page 243: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Statistical learning theory

• Condition of consistency of the ERM principleNecessarry and sufficient condition: finite VC dimension

Also: this is a sufficient condition of fast convergence

• VC (Vapnik-Chervonenkis) dimension:A set of function has VC dimension h if there exist hsamples that can be shattered (can be separated into two classes in all possible ways: all 2h possible ways) by this set of functions but there do not exist h +1 samples that can be shattered by the same set of functions.

Page 244: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model complexity, VC dimension

•VC dimension of a set of indicator functions– definition

VC dimension is the maximum number of samples forwhich all possible binary labellings can be induced by aset of functions

– illustration

linear separation no linear separation

Page 245: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

VC dimension

• Based on VC dimension upper bounds of the risk can be obtained

• Calculating the VC dimension– general case: rather difficult

e.g for MLP VC-dimension can be infinite

– special cases: e.g. linear function set

• VC dimension of a set of linear functions (linear separating task)

h =N +1 (N : input space dimension)

An important statement: It can be proved that the VC dimension can be less than N +1

Page 246: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Generalization error

• Bound on generalization– Classification: with probability of at least 1-η (confidence

level; η is a given value within the additional term)

– regression

)( termadditional)()( emp hRR +≤ ww

( ) ;1,/min 22 +≤ NMRh

points data all containing sphere a of Radius=R

tionclassifica ofmargin 1w

=M

(confidence interval)

( )+−≤

)(1

)()( emp

hc

RR

ε

ww

( )[ ] ( )N

hNaha 4/log1/log 21

ηε

−+=

Page 247: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Generalization error

guaranteed risk

training error

VC dimension

Gen

eral

iza t

ion

erro

r

Tradeoff between the quality of approximation and the complexity of the approximating function

emp

emp

small /

~ large/

RRhP

RRhP

>>⇒

Page 248: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Structural risk minimization principle

• Good generalization: both terms should be minimizedS set of approximating functionsThe elements of S, nested subset of Sk with finite VC dimension hk

S1 ⊂ S2 ⊂ … ⊂ Sk ⊂ …The ordering of complexity of the elements

h1 ≤ h2 ≤ … ≤ hk ≤ …Based on a priori information S is specified.For a given data set the optimal model estimation:

selection of an element of the set (model selection)estimating the model from this subset (training the model)there is an upper bound on the prediction risk with a givenconfidence level

Page 249: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Constructing a learning algorithm

• Structural risk minimization– Such Sk will be selected for which the guaranted

risk is minimal

– SRM principle suggests a tradeoff between the quality of approximation and the complexity of the approximating function (model selection problem)

– Both terms are controlled:• the empirical risk with training

• the complexity with the selection of Sk

)/( termadditional)()( emp kkP

kP hPRR +≤ ww

confidence interval

Page 250: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

SVM

• Support vector machines are such a learning machines that minimize the length of the weight vector

• They minimize the VC dimension. The upper bounds are valid for SVMs.

• For SVMs not only the structure (the size of the network) can be determined, but an estimate of its generalization error can be obtained.

Page 251: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsHaykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N.

J.1999.

Vapnik, V. "Statistical Learning Theory", Wiley, New York, 1998.

Cherkassky, V., Mulier, F. „Learning from Data, Concepts, Theory, and Methods”, John Wiley and Sons,1998.

Page 252: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modular network architectures

Page 253: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modular solution

• A set of networks: competition/cooperation

– all networks solve the same problem (competition/cooperation)

– the whole problem is decomposed: the different networks solve

different part of the whole problem (cooperation)

• Ensemble of networks

– linear combination of networks

• Mixture of experts

– using the same paradigm (e.g. neural networks)

– using different paradigms (e.g. neural networks + symbolic

systems, neural networks + fuzzy systems)

Page 254: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Cooperative networks

Ensemble of cooperating networks

(classification/regression)

• The motivation

– Heuristic explanation

• Different experts together can solve a problem better

• Complementary knowledge

– Mathematical justification

• Accurate and diverse modules

Page 255: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Linear combination of networks

NNM

NN1

NN2

α1

α2

αM

Σ

y1

y2

yM

x

NNM

α 0

y0=1

( ) ( )xx j

M

jj y∑

=

=0

,y αα

Page 256: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Ensemble of networks

• Mathematical justification

– Ensemble output

– Ambiguity (diversity)

– Individual error

– Ensemble error

– Constraint

( ) ( )xx j

M

jj y∑

=

=0

,y αα

( )[ ]2,y)((x) αε xx −= d

( ) ( )[ ]2)( xxx jj yd −=ε

( ) ( )[ ]2,)( α−= xxx yya jj

1=∑=

jj

αΜ

1

Page 257: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Ensemble of networks

• Mathematical justification (cont’d)– Weighted error

– Weighted diversity

– Ensemble error

– Averaging over the input distribution

Solution: Ensemble of accurate and diverse networks

( ) ( )xx j

M

jjaa ∑

=

=α0

, α

( ) ( )xx j

M

jjε=αε ∑

=0, α

( )[ ]2,)()( αε xxx yd −= ( )ααε ,),( xx a−=

∫ αε=x

xxx dfE )(),( ∫ αε=x

xxx dfE )(),( ∫ α=x

xxx dfaA )(),(

AEE −=

Page 258: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Ensemble of networks

• How to get accurate and diverse networks– different structures: more than one network structure (e.g.

MLP, RBF, CCN, etc.)

– different size, different complexity networks (number of

hidden units, number of layers, nonlinear function, etc.)

– different learning strategies (BP, CG, random search,etc.)

batch learning, sequential learning

– different training algorithms, sample order, learning samples

– different training parameters

– different initial parameter values

– different stopping criteria

Page 259: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Linear combination of networks

• Computation of optimal (fix) coefficients

– → simple average

– ,k depends on the input

for different input domains different network (alone)

gives the output

– optimal values using the constraint

– optimal values without any constraint

Wiener-Hopf equation

MkMk ...1,1

==α

kjjk ≠== ,0,1 αα

( ) PR 1*1

−= yα

( ) ( )[ ]Ty xyxyR E=

1=∑=

kαΜ

( ) ( )[ ]xxyP dE=

Page 260: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsHaykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N.

J.1999.

Hashem, S. “Optimal Linear Combination of Neural Networks” Neural Networks, Vol. 10. No. 4. pp. 599-614, 1997.

Krogh, A, Vedelsby, J.: “Neural Network Ensembles Cross Validation and Active Learning” InTesauro, G, Touretzky, D, Leen, T. Advances in Neural Information Processing Systems, 7. Cambridge, MA. MIT Press pp. 231-238.

Gasser Auda and Mohamed Kamel: „Modular Neural Networks: A survey” Pattern Analysis andíMachine Intelligence Lab. Systems Design Engineering Department, University of Waterloo, Canada.

Page 261: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

Expert 2Expert 1

Gating network

μ1

μ

g1 g2

x

Expert M

gM

Σ

μΜ

Page 262: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

• The output is the weighted sum of the outputs of the experts

is the parameter of the i -th expert

• The output of the gating network: “softmax” function

• is the parameter of the gating network

∑=

= M

j

ij

i

e

eg

1

ξ

ξ

ξi iT= v x

i

M

iiμgμ ∑

=1= 1

1=∑

=

M

iig 0≥ig i∀),( ii fμ Θx=

Tiv

Page 263: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

• Probabilistic interpretation

the probabilistic model with true parameters

a priori probability

],|[ ii yEμ Θ= x ),|( ii iPg vx=

∑=i

iii PgP ),|(),(),|( 000 ΘxyvxΘxy

g P ii i i( , ) ( | , )x v x v0 0=

Page 264: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

• Training

– Training data

– Probability of generating output from the input

– The log likelihood function (maximum likelihood

estimation)

( ) Pl

llX 1)()( , == yx

P P i Pl l li

l li

i( | , ) ( | , ) ( | , )( ) ( ) ( ) ( ) ( )y x x v y xΘ Θ= ∑

∏ ∑∏==

⎥⎦

⎤⎢⎣

⎡==

P

l ii

lli

lP

l

ll PiPPP1

)()()(

1

)()( ),|(),|(),|(),|( ΘxyvxΘxyΘxy

⎥⎦

⎤⎢⎣

⎡= ∑∑

ii

lli

l

lPiP ),|(),|(log),( )()()( ΘxyvxΘxL

Page 265: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

• Training (cont’d)

– Gradient method

– The parameter of the expert network

– The parameter of the gating network

and 0vΘx

=i∂

∂ ),(L0

ΘΘx

=∂ i

),(L∂

i

iP

li

lliii hkk

ΘyΘΘ

∂∂μ

μη∑=

−+=+1

)()( )()()1(

( ) )(

1

)()()()1( lP

l

li

liii ghkk xvv ∑

=

−+=+ η

i

i

ii vΘx

vΘx

∂∂ξ

∂ξ∂

∂∂ ),(),( LL

=i

i

ii μ ΘΘx

ΘΘx

∂∂

∂=

∂μ∂∂ ),(),( LL

Page 266: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

• Training (cont’d)

– A priori probability

– A posteriori probability

( )( ) ( ) ( )

( ) ( ) ( )∑=

jj

lllj

illl

ili Pg

Pgh

),(

),(

Θ

Θ

xy

xy

( ) ( ) ( ) ),|(),( il

il

il

i iPgg vxvx ==

Page 267: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Mixture of Experts (MOE)

• Training (cont’d)

– EM (Expectation Maximization) algorithm

A general iterative technique for maximum likelihood

estimation

• Introducing hidden variables

• Defining a log-likelihood function

– Two steps:

• Expectation of the hidden variables

• Maximization of the log-likelihood function

Page 268: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Hierarchical Mixture of Experts (HMOE)

HMOE: more layers of gating networks, groups of experts

Page 269: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• MOE construction

• Cross-validation can be used to find the proper

architecture

• CART (Clasification And Regression Tree) for initial

hierarchical MOE (HMOE) architecture and for the initial

expert and gating network parameters

• MOE based on SVMs: different SVMs with different

hyperparameters

Mixture of Experts (MOE)

Page 270: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readings

Haykin, S.: "Neural Networks. A Comprehensive Foundation" Prentice Hall, N. J.1999. Jordan, M. I., Jacobs, R. A.: “Hierarchical Mixture of Experts and the EM Algorithm”

Neural Computation Vol. 6. pp. 181-214, 1994.Bilmes, J. A. et al “A Gentle Tutorial of the EM Algorithm and its application to Parameter

Estimation for Gaussian Mixture and Hidden Markov Models” 1998.Dempster A.P. et al “Maximum-likelihood from incomplete data via the EM algorithm”,

1977.Moon, T.K.”The Expectation-Maximization Algorithm” IEEE Trans Signal Processing,

1996.

Page 271: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Application: modelling an industrial plant

(steel converter)

Page 272: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Overview

• Introduction• Modeling approaches• Building neural models• Data base construction• Model selection• Modular approach• Hybrid approach• Information system• Experiences with the advisory system• Conclusions

Page 273: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Overview

• Introduction• Modeling approaches• Building neural models• Data base construction• Model selection• Modular approach• Hybrid approach• Information system• Experiences with the advisory system• Conclusions

Page 274: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Introduction to the problem

• Task– to develop an advisory system for a Linz-Donawitz

steel converter

– to propose component composition

– to support the factory staff in supervising the steel-making process

• A model of the process is required: first a system modelling task should be solved

Page 275: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

LD Converter modeling

The Linz-Donawitz

converter in Hungary

(Dunaferr Co.)

Basic Oxigen Steelmaking

(BOS)

Page 276: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Phases of steelmaking

• 1. Filling of waste iron• 2. Filling of pig iron• 3. Blasting with pure

oxygen• 4. Supplement additives• 5. Sampling for quality

testing• 6. Tapping of steel and

slag

Linz-Donawitz converter

Page 277: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Phases of steelmaking

• 1. Filling of waste iron• 2. Filling of pig iron• 3. Blasting with pure

oxygen• 4. Supplement additives• 5. Sampling for quality

testing• 6. Tapping of steel and slag

Linz-Donawitz converter

Page 278: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Phases of steelmaking

• 1. Filling of waste iron• 2. Filling of pig iron• 3. Blasting with pure

oxygen• 4. Supplement additives• 5. Sampling for quality

testing• 6. Tapping of steel and slag

Linz-Donawitz converter

Page 279: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Phases of steelmaking

• 1. Filling of waste iron• 2. Filling of pig iron• 3. Blasting with pure

oxygen• 4. Supplement additives• 5. Sampling for quality

testing• 6. Tapping of steel and slag

Linz-Donawitz converter

Page 280: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Phases of steelmaking

• 1. Filling of waste iron• 2. Filling of pig iron• 3. Blasting with pure

oxygen• 4. Supplement additives• 5. Sampling for quality

testing• 6. Tapping of steel and slag

Linz-Donawitz converter

Page 281: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Phases of steelmaking

• 1. Filling of waste iron• 2. Filling of pig iron• 3. Blasting with pure

oxygen• 4. Supplement additives• 5. Sampling for quality

testing• 6. Tapping of steel and slag

Linz-Donawitz converter

Page 282: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Nonlinear input-output relation between many

inputs and two outputs

• input parameters (~50 different parameters)

– certain features “measured” during the process

• The main output parameters (output

measured values of the produced steel)

– temperature (1640-1700 CO -10 … +15 CO)

– carbon content (0.03 - 0.70 % )

• More than 5000 records of data

Main featutes of the process

Page 283: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• The difficulties of model building

– High complexity nonlinear input-output relationship

– No (or unsatisfactory) physical insight

– Relatively few measurement data

– There are unmeasurable parameters of the process

– Noisy, imprecise, unreliable data

– Classical approach (heat balance, mass balance) gives

no acceptable results

Modeling task

Page 284: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modeling approaches

• Theoretical model - based on chemical andphysical equations

• Input - output behavioral model

– Neural model - based on the measured process data

– Rule based system - based on the experimental

knowledge of the factory staff

– Combined neural - rule based system: a hybrid model

Page 285: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

System

Model

Σε

components(parameters)

temperature

predictedtemperature

+

-

oxygen

measuredtemperature

components(parameters)

Σε

-

+

Copy ofModel

predictedoxygen

InverseModel

model outputtemperature

The modeling task

Page 286: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Overview

• Introduction• Modeling approaches• Building neural models• Data base construction• Model selection• Modular approach• Hybrid approach• Information system• Experiences with the advisory system• Conclusions

Page 287: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

„Neural” solution

• The steps of solving a practical problem

Preprocessing

Neural network

Postprocessing

Results

Raw input data

Page 288: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Creating a reliable database – the problem of noisy data

– the problem of missing data

– the problem of uneven data distribution

• Selecting a proper neural architecture– static network (size of the network)

– dynamic network

• size of the network: nonlinear mapping

• regressor selection + model order selection

• Training and validating the model

Building neural models

Page 289: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Creating a reliable database

• Input components – measure of importance

• physical insight• sensitivity analysis (importance of the input variables)• mathematical methods: dimension reduction (e.g. PCA)

• Normalization– input normalization– output normalization

• Missing data– artificially generated data

• Noisy data– preprocessing, filtering, – errors-in-variables criterion function, etc.

Page 290: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Building database

• Selecting input components, sensitivity analysis

Initial databaseInitial database

Neural network trainingNeural network training

Sensitivity analysisSensitivity analysis

Input parameter of small effect on

the output?

Input parameter of small effect on

the output?

New databaseNew database

Input parameter cancellation

Input parameter cancellation

no

yes

Page 291: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Building database

• Dimension reduction: mathematical methods

– PCA

– Non-linear PCA, Kernel PCA

– ICA

• Combined methods

Page 292: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

The effect of data distribution

• Typical data distributions

0 10 20 30 40 50 600

100

200

300

400

500

600

700

800

900

1000

1600 1620 1640 1660 1680 1700 1720 17400

10

20

30

40

50

60

70

Uneven distribution Approximately Gaussian distribution

Page 293: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Normalization

• Zero mean, unit standard deviation

• Normalization into [0,1]

• Decorrelation + normalization

minmaxmin~

ii

iixx

xxx i −−

=

∑==

P

p

pii x

Px

1

)(1∑ −

−=

=

P

pi

pii xx

P 1

2)(2 )(1

1σi

ip

ipi

xxxσ

−=

)()(~

∑=

−−−

=P

p

ppP 1

T)()( ))((1

1 xxxxΣ )...(diag 1 Nλλ=Λ

)(~ )(T2/1)( xxΦx −= − pp Λ

jjj λ ϕϕ =Σ

[ ]TNϕϕϕ K21=Φ

Page 294: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Normalization

• Decorrelation + normalization = Whitening transformation

Original

Whitened

Page 295: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Missing or few data

• Filling in the missing values

– based on available information

• Artificially generated data

– using trends

– using correlation

– using realistic transformations

Page 296: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Overview

• Introduction• Modeling approaches• Building neural models• Data base construction• Model selection• Modular approach• Hybrid approach• Information system• Experiences with the advisory system• Conclusions

Page 297: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Missing or few data

• Filling in the missing values based on:

correlation coefficient between and

previous (other) values

other parameters or

time dependence (dynamic problem)

• Artificially generated data

– using trends

– using correlation

– using realistic transformations

),(),(),(),(~

jjiijiji

CCCC =

ξσ iii xx +=ˆ

)(ˆˆ )()( kj

ki xfx =

)()(),(R ττ += txtxEt iii

ix jx

,...),(ˆˆ )()()( kj

kj

ki xxfx =

Page 298: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Few data

• Artificial data generation

– using realistic transformations

– using sensitivity values: data generation around various

working points (a good example: ALVINN)

(ALVINN = Autonomous Land Vehicle In a Neural Net an onroadneural network navigation solution developed in CMU)

Page 299: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Noisy data

Inherent noise suppression– classical neural nets have noise suppression property

(inherent regularization)

– Regularization (smoothing regularization)

– averaging (modular approach)

• SVMε-insensitive criterion function

• EIV– input and output noise are taken into consideration

– modified criterion function

Page 300: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Reducing the effect of output noise

• Inherent regularization of MLP (smooth sigmoidal function)• SVM with ε - insensitive loss function• Regularization: example regularized (kernel) CMAC

Page 301: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Reducing the effect of input and output noise

• Errors in variables (EIV)

System][

,,i

kxmn

*kx *

ky

][,

ikpn

][,,

ikymn

][ ikx

][iky

+

+

+

∑=

=M

i

ikk x

Mx

1

][1∑=

=M

i

ikk y

My

1

][1

∑=

−−

=M

ik

ikkx xx

M 1

2][2, )(

11σ ∑

=

−−

=M

ik

ikky yy

M 1

2][2, )(

11σ

Page 302: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

EIV

• LS vs EIV criterion function

• EIV training

• Danger : overfitting → early stopping

∑= ∂

∂=Δ

P

k j

kNN

ky

kfj

xfeP

M1

2,

, ),(2 W

WW

ση

⎥⎥⎦

⎢⎢⎣

⎡+

∂∂

=Δ 2,

,2,

, ),(2 kx

kx

k

kNN

ky

kfk

exxfeMx

σση

W

∑=

−=P

kkNNkLS xfy

PC

1

2** )),((1 W

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −+

−=

P

k kx

kk

ky

kNNkEIV

xxxfyP

C1

2,

2

2,

2 )()),((1σσ

W

),(, WkNNkkf xfye −=

Page 303: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Noisy data

• Output noise is easier to suppress than input noise

• SVM, regularization can reduce the effect of output noise

• EIV (and similar other methods) can take into consideration the input noise

• EIV results in only slightly better approximation

• EIV is rather prone to overfitting (much more free parameters) → early stopping

Page 304: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Overview

• Introduction• Modeling approaches• Building neural models• Data base construction• Model selection• Modular approach• Hybrid approach• Information system• Experiences with the advisory system• Conclusions

Page 305: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model selection

• Static or dynamic– why better a dynamic model

• Dynamic model class

– regressor selection– basis function selection

• Size of the network– number of layers

– number of hidden neurons

– model order

Page 306: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model selection

• NFIR• NARX• NOE• NARMAX

NARX model, NOE model: model order selection

Model order: the input dimension of the static network

( ) [ ])(),...,2(),1(),(),...,2(),1(),( mkykykynkkkkfkyM −−−−−−= xxxx

Page 307: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model order selection

• AIC, MDL, NIC, Lipschitz number

• Lipschitz number, Lipschitz quotient

( ) [ ])(),...,2(),1(),(),...,2(),1(),( mkykykynkkkkfky −−−−−−= xxxx

( ) ( )p

lp

k

l kqnq/1

1)( ⎟⎠

⎞⎜⎝

⎛ ∏==

,ji

iiij

yyq

xx −

−=

1 2 3 4 5 6 7 8 94

5

6

7

8

9

10

11

12

0 5 10 15 206

8

10

12

14

16

18

q(l) q(l)

model order model order

noiseless case noisy case

optimal model order no definite point

Page 308: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model order selection

• Lipschitz quotientgeneral nonlinear input-output relation, f(.) continuous, smooth

multivariable function

bounded derivatives

Lipschitz quotient

Sensitivity analysis

[ ]n,...,x,xxfy 21=

niMxffi

i ,...,2,1' =≤∂∂

=

jiyy

qji

iiij ≠

−= ,

xxLqij ≤≤0

nnnn

xfxfxfxxfx

xfx

xfy ΔΔΔΔΔΔΔ '

2'

21'

122

11

+++=∂∂

++∂∂

+∂∂

= KK

Page 309: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Model order selection

• Noisy caseCombined method of Lipschitz + EIV

Lipschitz-algorithm

Modelcreation

BP + EIV

BP+LS

Lipschitz-algorithm

Comparingresults

InitialEstimation

xd

x

x

d

d

dTrainedNetwork

*x

NewEstimation

UntrainedNeural

Network

TrainedNetwork

x

Page 310: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Correlation based model order selection

• Model order 2...4 because of prcatical problems

• Too many input components

• (2...4) * (number of input components + outputs)

• Too large network

• Too few training data

• The problem of missing data

• Network size: cross-validation

Page 311: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsBerényi, P.,, Horváth, G., Pataki, B., Strausz, Gy. : "Hybrid-Neural Modeling of a Complex Industrial

Process" Proc. of the IEEE Instrumentation and Measurement Technology Conference, IMTC'2001. Budapest, May 21-23. Vol. III. pp. 1424-1429.

Horváth, G., Pataki, B. Strausz, T.: "Neural Modeling of a Linz-Donawitz Steel Converter: Difficulties and Solutions" Proc. of the EUFIT'98, 6th European Congress on Intelligent Techniques and Soft Computing. Aachen, Germany. 1998. Sept. pp.1516-1521

Horváth, G. Pataki, B. Strausz, Gy.: "Black box modeling of a complex industrial process", Proc. Of the 1999 IEEE Conference and Workshop on Engineering of Computer Based Systems, Nashville, TN, USA. 1999. pp. 60-66

Pataki, B., Horváth, G., Strausz, Gy., Talata, Zs. "Inverse Neural Modeling of a Linz-Donawitz SteelConverter" e & i Elektrotechnik und Informationstechnik, Vol. 117. No. 1. 2000. pp.

Strausz, Gy., G. Horváth, B. Pataki : "Experiences from the results of neural modelling of an industrialprocess" Proc. of Engineering Application of Neural Networks, EANN'98, Gibraltar 1988. pp. 213-220

Akaike, H. “A New Look at the Statistical Model Identification”, IEEE Transaction on Aut. Control Vol. AC-19, No.6., 1974, pp.716-723

Rissanen, J. “Estimation of Structure by Minimum Description Length”, Circuits, Systems and Signal Processing, special issue on Rational Approximations, Vol. 1, No. 3-4, 1982, pp. 395-406.

Akaike, H. “Statistical Predictor Identification”, Ann. Istitute of Statistical Mathematics, Vol. 22., 1970, pp. 203-217.

He, X. and Asada, H.“A New Method for Identifying Orders of Input-output Models for Nonlinear Dynamic Systems” Proceedings of the American Control Conference, San Francisco, California, June 1993, pp. 2520-2523.

Page 312: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Overview

• Introduction• Modeling approaches• Building neural models• Data base construction• Model selection• Modular approach• Hybrid approach• Information system• Experiences with the advisory system• Conclusions

Page 313: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Modular solution

• More neural modell for the different working conditions

• Processing of special cases

• Depending on the distribution of inputparameters

• Cooperative or competitive modular architecture

Page 314: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Hybrid solution

• Utilization of different forms of information

– measurement, experimental data

– symbolic rules

– mathematical equations, physical knowledge

Page 315: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• Solution: – integration of measurement information and experimental

knowledge about the process results

• Realization:– development system – supports the design and testing of

different hybrid models

– advisory system

hybrid models using the current process state and input information,

experiences collected by the rule-base system can be used to

update the model.

The hybrid information system

Page 316: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

NNK

O1 O2

OKOSZ ΔO

NN2

NN1

...

Correction term

expert system

Output estimator

expert system

Input data preparatory expert system

Output expert system

Mixture of experts system

Input dataControl

Oxygen predictionNo prediction (explanation)

The hybrid-neural system

Information processing

Page 317: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Neural Model

Data preprocessing

Input data

Data preprocessing and correction

The hybrid-neural system

Page 318: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Expert for selecting a neural model

NN 2

NN 1

NN k

Input data

OkO2O1

Conditional network running

The hybrid-neural system

Page 319: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

NN k

NN 2

Expertfor

selecting an NN model

Output expert

Input data

NN 1

Ox. prediction

O1 O2 Ok

Parallel network running -

postprocessing

The hybrid-neural system

Page 320: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Result satisfactory

N Y

Neural network running, prediction

making

Modification of input parameters

Iterative network running

The hybrid-neural system

Page 321: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Validation

• Model selection– iterative process

– utilization of domain knowledge

• Cross validation– fresh data

– on-site testing

Page 322: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• The hit rate is increased by + 10%

• Most of the special cases can be handled

• Further rules for handling special cases should be obtained

• The accuracy of measured data should be increased

Experiences

Page 323: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

• For complex industrial problems all available information have to be used

• Thinking about NNs as universal modeling devices alone

• Physical insight is important

• The importance of preprocessing and post-processing

• Modular approach: – decomposition of the problem

– cooperation and competition

– “experts” using different paradigms

• The hybrid approach to the problem provided better results

Conclusions

Page 324: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Summary

• Main questions

• Open questions

• Final conclusions

Page 325: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Main questions

• Neural modeling: black-box or not?

• When to apply neural approach?

• How to use neural networks?

• The role of prior knowledge

• How to use prior knowledge?

• How to validate the results?

Page 326: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Open (partly open) questions

• Model class selection

• Model order selection

• Validation, generalization capability

• Sample size, training set, test set, validation set

• Missing data, noisy data, few data

• Data consistency

Page 327: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

Final coclusions• Neural networks are especially important and proper architectures

for (nonlinear) system modelling• General solutions: NN and fuzzy-neural systems are universal

modeling devices (universal approximators)• The importance of the theoretical results, theoretical background• The difficulty of the application of theoretical results in practice• The role of data base• The importance of prior information, physical insight• The importance of preprocessing and post-processing • Modular approach:

– decomposition of the problem– cooperation and competition– “experts” using different paradigms

• Hybrid solutions: combination of rule based, fuzzy, neural,mathematical

Page 328: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsParag H. Batavia, Dean A. Pomerleau, Charles E. Thorpe.: „Applying Advanced Learning Algorithms to

ALVINN”, Technical Report, CMU-RI-TR-96-31 Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213-3890

Berényi, P.,, Horváth, G., Pataki, B., Strausz, Gy. : "Hybrid-Neural Modeling of a Complex Industrial Process" Proc. of the IEEE Instrumentation and Measurement Technology Conference, IMTC'2001. Budapest, May 21-23. Vol. III. pp. 1424-1429.

Berényi P., Valyon J., Horváth, G. : "Neural Modeling of an Industrial Process with Noisy Data" IEA/AIE-2001, The Fourteenth International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems, June 4-7, 2001, Budapest in Lecture Notes in Computer Sciences, 2001, Springer, pp. 269-280.

Bishop, C, M.: “Neural Networks for Pattern Recognition” Clanderon Press, Oxford, 1995.

Horváth, G., Pataki, B. Strausz, T.: "Neural Modeling of a Linz-Donawitz Steel Converter: Difficulties and Solutions" Proc. of the EUFIT'98, 6th European Congress on Intelligent Techniques and Soft Computing. Aachen, Germany. 1998. Sept. pp.1516-1521

Horváth, G. Pataki, B. Strausz, Gy.: "Black box modeling of a complex industrial process", Proc. Of the 1999 IEEE Conference and Workshop on Engineering of Computer Based Systems, Nashville, TN, USA. 1999. pp. 60-66

Pataki, B., Horváth, G., Strausz, Gy., Talata, Zs. "Inverse Neural Modeling of a Linz-Donawitz SteelConverter" e & i Elektrotechnik und Informationstechnik, Vol. 117. No. 1. 2000. pp.

Page 329: NEURAL NETWORKS FOR SYSTEM MODELING

Neural Networks for System Modeling • Gábor Horváth, 2005 Budapest University of Technology and Economics

References and further readingsStrausz, Gy., G. Horváth, B. Pataki : "Experiences from the results of neural modelling of an

industrial process" Proc. of Engineering Application of Neural Networks, EANN'98, Gibraltar 1988. pp. 213-220

Strausz, Gy., G. Horváth, B. Pataki : "Effects of database characteristics on the neural modelingof an industrial process" Proc. of the International ICSC/IFAC Symposium on Neural Computation / NC’98, Sept. 1998, Vienna pp. 834-840.

Bishop, C, M.: “Neural Networks for Pattern Recognition” Clanderon Press, Oxford, 1995.

Horváth, G (ed.),” Neural Networks and Their Applications”, Publishing house of the Budapest University of Technology and Economics, Budapest, 1998. (in Hungarian)

Jang, J. -S. R., Sun, C. -T. and Mizutani: E. „Neuro-Fuzzy and Soft Computing. A Computational Approach to Learning and Machine Intelligence”, Prentice Hall, 1997.

Jang, J. -S. R: „ANFIS: Adaptive-Network-Based Fuzzy Inference System” IEEE Trans. on Sysytem Man, and Cybernetics. Vol. 23. No.3. pp. 665-685, 1993.

Nguyen, D. and Widrow, B. (1989). "The Truck Backer-Upper: An Example of Self-Learning in Neural Networks," in Proceedings of the International Joint Conference on Neural Networks(Washington, DC 1989), vol. II, 357-362.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986b). "Learning Internal Representations by Error Propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. I, D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. MIT Press, Cambridge (1986).