tools for modelling and identification with bond graphs and

186
Tools for Modelling and Identification with Bond Graphs and Genetic Programming by Stefan Wiechula A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Applied Science in Mechanical Engineering Waterloo, Ontario, Canada, 2006 c Stefan Wiechula 2006

Upload: others

Post on 25-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tools for Modelling and Identification with Bond Graphs and

Tools for Modelling and Identification withBond Graphs and Genetic Programming

by

Stefan Wiechula

A thesispresented to the University of Waterloo

in fulfilment of thethesis requirement for the degree of

Master of Applied Sciencein

Mechanical Engineering

Waterloo Ontario Canada 2006

ccopyStefan Wiechula 2006

I hereby declare that I am the sole author of this thesis This is a truecopy of the thesis including any required final revisions as accepted by myexaminers

I understand that my thesis may be made electronically available to thepublic

ii

Abstract

The contributions of this work include genetic programming grammars forbond graph modelling and for direct symbolic regression of sets of differentialequations a bond graph modelling library suitable for programmatic use asymbolic algebra library specialized to this use and capable of among otherthings breaking algebraic loops in equation sets extracted from linear bondgraph models Several non-linear multi-body mechanics examples are pre-sented showing that the bond graph modelling library exhibits well-behavedsimulation results Symbolic equations in a reduced form are produced au-tomatically from bond graph models The genetic programming system istested against a static non-linear function identification problem using type-less symbolic regression The direct symbolic regression grammar is shownto have a non-deceptive fitness landscape perturbations of an exact pro-gram have decreasing fitness with increasing distance from the ideal Theplanned integration of bond graphs with genetic programming for use as asystem identification technique was not successfully completed A catego-rized overview of other modelling and identification techniques is included ascontext for the choice of bond graphs and genetic programming

iii

Acknowledgements

Irsquod like to thank my supervisor Dr Jan Huissoon for his support

iv

Contents

1 Introduction 111 System identification 112 Black box and grey box problems 313 Static and dynamic models 614 Parametric and non-parametric models 815 Linear and nonlinear models 916 Optimization search machine learning 1117 Some model types and identification techniques 12

171 Locally linear models 13172 Nonparametric modelling 14173 Volterra function series models 15174 Two special models in terms of Volterra series 16175 Least squares identification of Volterra series models 18176 Nonparametric modelling as function approximation 21177 Neural networks 22178 Parametric modelling 23179 Differential and difference equations 231710 Information criteriandashheuristics for choosing model order 251711 Fuzzy relational models 261712 Graph-based models 28

18 Contributions of this thesis 30

2 Bond graphs 3121 Energy based lumped parameter models 3122 Standard bond graph elements 32

221 Bonds 32222 Storage elements 33223 Source elements 34

v

224 Sink elements 35225 Junctions 35

23 Augmented bond graphs 39231 Integral and derivative causality 41232 Sequential causality assignment procedure 44

24 Additional bond graph elements 46241 Activated bonds and signal blocks 46242 Modulated elements 46243 Complex elements 47244 Compound elements 48

25 Simulation 51251 State space models 51252 Mixed causality models 52

3 Genetic programming 5431 Models programs and machine learning 5432 History of genetic programming 5533 Genetic operators 57

331 The crossover operator 58332 The mutation operator 58333 The replication operator 58334 Fitness proportional selection 59

34 Generational genetic algorithms 5935 Building blocks and schemata 6136 Tree structured programs 63

361 The crossover operator for tree structures 67362 The mutation operator for tree structures 67363 Other operators on tree structures 68

37 Symbolic regression 6938 Closure or strong typing 7039 Genetic programming as indirect search 72310 Search tuning 73

3101 Controlling program size 733102 Maintaining population diversity 74

4 Implementation 7641 Program structure 76

411 Formats and representations 77

vi

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 2: Tools for Modelling and Identification with Bond Graphs and

I hereby declare that I am the sole author of this thesis This is a truecopy of the thesis including any required final revisions as accepted by myexaminers

I understand that my thesis may be made electronically available to thepublic

ii

Abstract

The contributions of this work include genetic programming grammars forbond graph modelling and for direct symbolic regression of sets of differentialequations a bond graph modelling library suitable for programmatic use asymbolic algebra library specialized to this use and capable of among otherthings breaking algebraic loops in equation sets extracted from linear bondgraph models Several non-linear multi-body mechanics examples are pre-sented showing that the bond graph modelling library exhibits well-behavedsimulation results Symbolic equations in a reduced form are produced au-tomatically from bond graph models The genetic programming system istested against a static non-linear function identification problem using type-less symbolic regression The direct symbolic regression grammar is shownto have a non-deceptive fitness landscape perturbations of an exact pro-gram have decreasing fitness with increasing distance from the ideal Theplanned integration of bond graphs with genetic programming for use as asystem identification technique was not successfully completed A catego-rized overview of other modelling and identification techniques is included ascontext for the choice of bond graphs and genetic programming

iii

Acknowledgements

Irsquod like to thank my supervisor Dr Jan Huissoon for his support

iv

Contents

1 Introduction 111 System identification 112 Black box and grey box problems 313 Static and dynamic models 614 Parametric and non-parametric models 815 Linear and nonlinear models 916 Optimization search machine learning 1117 Some model types and identification techniques 12

171 Locally linear models 13172 Nonparametric modelling 14173 Volterra function series models 15174 Two special models in terms of Volterra series 16175 Least squares identification of Volterra series models 18176 Nonparametric modelling as function approximation 21177 Neural networks 22178 Parametric modelling 23179 Differential and difference equations 231710 Information criteriandashheuristics for choosing model order 251711 Fuzzy relational models 261712 Graph-based models 28

18 Contributions of this thesis 30

2 Bond graphs 3121 Energy based lumped parameter models 3122 Standard bond graph elements 32

221 Bonds 32222 Storage elements 33223 Source elements 34

v

224 Sink elements 35225 Junctions 35

23 Augmented bond graphs 39231 Integral and derivative causality 41232 Sequential causality assignment procedure 44

24 Additional bond graph elements 46241 Activated bonds and signal blocks 46242 Modulated elements 46243 Complex elements 47244 Compound elements 48

25 Simulation 51251 State space models 51252 Mixed causality models 52

3 Genetic programming 5431 Models programs and machine learning 5432 History of genetic programming 5533 Genetic operators 57

331 The crossover operator 58332 The mutation operator 58333 The replication operator 58334 Fitness proportional selection 59

34 Generational genetic algorithms 5935 Building blocks and schemata 6136 Tree structured programs 63

361 The crossover operator for tree structures 67362 The mutation operator for tree structures 67363 Other operators on tree structures 68

37 Symbolic regression 6938 Closure or strong typing 7039 Genetic programming as indirect search 72310 Search tuning 73

3101 Controlling program size 733102 Maintaining population diversity 74

4 Implementation 7641 Program structure 76

411 Formats and representations 77

vi

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 3: Tools for Modelling and Identification with Bond Graphs and

Abstract

The contributions of this work include genetic programming grammars forbond graph modelling and for direct symbolic regression of sets of differentialequations a bond graph modelling library suitable for programmatic use asymbolic algebra library specialized to this use and capable of among otherthings breaking algebraic loops in equation sets extracted from linear bondgraph models Several non-linear multi-body mechanics examples are pre-sented showing that the bond graph modelling library exhibits well-behavedsimulation results Symbolic equations in a reduced form are produced au-tomatically from bond graph models The genetic programming system istested against a static non-linear function identification problem using type-less symbolic regression The direct symbolic regression grammar is shownto have a non-deceptive fitness landscape perturbations of an exact pro-gram have decreasing fitness with increasing distance from the ideal Theplanned integration of bond graphs with genetic programming for use as asystem identification technique was not successfully completed A catego-rized overview of other modelling and identification techniques is included ascontext for the choice of bond graphs and genetic programming

iii

Acknowledgements

Irsquod like to thank my supervisor Dr Jan Huissoon for his support

iv

Contents

1 Introduction 111 System identification 112 Black box and grey box problems 313 Static and dynamic models 614 Parametric and non-parametric models 815 Linear and nonlinear models 916 Optimization search machine learning 1117 Some model types and identification techniques 12

171 Locally linear models 13172 Nonparametric modelling 14173 Volterra function series models 15174 Two special models in terms of Volterra series 16175 Least squares identification of Volterra series models 18176 Nonparametric modelling as function approximation 21177 Neural networks 22178 Parametric modelling 23179 Differential and difference equations 231710 Information criteriandashheuristics for choosing model order 251711 Fuzzy relational models 261712 Graph-based models 28

18 Contributions of this thesis 30

2 Bond graphs 3121 Energy based lumped parameter models 3122 Standard bond graph elements 32

221 Bonds 32222 Storage elements 33223 Source elements 34

v

224 Sink elements 35225 Junctions 35

23 Augmented bond graphs 39231 Integral and derivative causality 41232 Sequential causality assignment procedure 44

24 Additional bond graph elements 46241 Activated bonds and signal blocks 46242 Modulated elements 46243 Complex elements 47244 Compound elements 48

25 Simulation 51251 State space models 51252 Mixed causality models 52

3 Genetic programming 5431 Models programs and machine learning 5432 History of genetic programming 5533 Genetic operators 57

331 The crossover operator 58332 The mutation operator 58333 The replication operator 58334 Fitness proportional selection 59

34 Generational genetic algorithms 5935 Building blocks and schemata 6136 Tree structured programs 63

361 The crossover operator for tree structures 67362 The mutation operator for tree structures 67363 Other operators on tree structures 68

37 Symbolic regression 6938 Closure or strong typing 7039 Genetic programming as indirect search 72310 Search tuning 73

3101 Controlling program size 733102 Maintaining population diversity 74

4 Implementation 7641 Program structure 76

411 Formats and representations 77

vi

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 4: Tools for Modelling and Identification with Bond Graphs and

Acknowledgements

Irsquod like to thank my supervisor Dr Jan Huissoon for his support

iv

Contents

1 Introduction 111 System identification 112 Black box and grey box problems 313 Static and dynamic models 614 Parametric and non-parametric models 815 Linear and nonlinear models 916 Optimization search machine learning 1117 Some model types and identification techniques 12

171 Locally linear models 13172 Nonparametric modelling 14173 Volterra function series models 15174 Two special models in terms of Volterra series 16175 Least squares identification of Volterra series models 18176 Nonparametric modelling as function approximation 21177 Neural networks 22178 Parametric modelling 23179 Differential and difference equations 231710 Information criteriandashheuristics for choosing model order 251711 Fuzzy relational models 261712 Graph-based models 28

18 Contributions of this thesis 30

2 Bond graphs 3121 Energy based lumped parameter models 3122 Standard bond graph elements 32

221 Bonds 32222 Storage elements 33223 Source elements 34

v

224 Sink elements 35225 Junctions 35

23 Augmented bond graphs 39231 Integral and derivative causality 41232 Sequential causality assignment procedure 44

24 Additional bond graph elements 46241 Activated bonds and signal blocks 46242 Modulated elements 46243 Complex elements 47244 Compound elements 48

25 Simulation 51251 State space models 51252 Mixed causality models 52

3 Genetic programming 5431 Models programs and machine learning 5432 History of genetic programming 5533 Genetic operators 57

331 The crossover operator 58332 The mutation operator 58333 The replication operator 58334 Fitness proportional selection 59

34 Generational genetic algorithms 5935 Building blocks and schemata 6136 Tree structured programs 63

361 The crossover operator for tree structures 67362 The mutation operator for tree structures 67363 Other operators on tree structures 68

37 Symbolic regression 6938 Closure or strong typing 7039 Genetic programming as indirect search 72310 Search tuning 73

3101 Controlling program size 733102 Maintaining population diversity 74

4 Implementation 7641 Program structure 76

411 Formats and representations 77

vi

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 5: Tools for Modelling and Identification with Bond Graphs and

Contents

1 Introduction 111 System identification 112 Black box and grey box problems 313 Static and dynamic models 614 Parametric and non-parametric models 815 Linear and nonlinear models 916 Optimization search machine learning 1117 Some model types and identification techniques 12

171 Locally linear models 13172 Nonparametric modelling 14173 Volterra function series models 15174 Two special models in terms of Volterra series 16175 Least squares identification of Volterra series models 18176 Nonparametric modelling as function approximation 21177 Neural networks 22178 Parametric modelling 23179 Differential and difference equations 231710 Information criteriandashheuristics for choosing model order 251711 Fuzzy relational models 261712 Graph-based models 28

18 Contributions of this thesis 30

2 Bond graphs 3121 Energy based lumped parameter models 3122 Standard bond graph elements 32

221 Bonds 32222 Storage elements 33223 Source elements 34

v

224 Sink elements 35225 Junctions 35

23 Augmented bond graphs 39231 Integral and derivative causality 41232 Sequential causality assignment procedure 44

24 Additional bond graph elements 46241 Activated bonds and signal blocks 46242 Modulated elements 46243 Complex elements 47244 Compound elements 48

25 Simulation 51251 State space models 51252 Mixed causality models 52

3 Genetic programming 5431 Models programs and machine learning 5432 History of genetic programming 5533 Genetic operators 57

331 The crossover operator 58332 The mutation operator 58333 The replication operator 58334 Fitness proportional selection 59

34 Generational genetic algorithms 5935 Building blocks and schemata 6136 Tree structured programs 63

361 The crossover operator for tree structures 67362 The mutation operator for tree structures 67363 Other operators on tree structures 68

37 Symbolic regression 6938 Closure or strong typing 7039 Genetic programming as indirect search 72310 Search tuning 73

3101 Controlling program size 733102 Maintaining population diversity 74

4 Implementation 7641 Program structure 76

411 Formats and representations 77

vi

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 6: Tools for Modelling and Identification with Bond Graphs and

224 Sink elements 35225 Junctions 35

23 Augmented bond graphs 39231 Integral and derivative causality 41232 Sequential causality assignment procedure 44

24 Additional bond graph elements 46241 Activated bonds and signal blocks 46242 Modulated elements 46243 Complex elements 47244 Compound elements 48

25 Simulation 51251 State space models 51252 Mixed causality models 52

3 Genetic programming 5431 Models programs and machine learning 5432 History of genetic programming 5533 Genetic operators 57

331 The crossover operator 58332 The mutation operator 58333 The replication operator 58334 Fitness proportional selection 59

34 Generational genetic algorithms 5935 Building blocks and schemata 6136 Tree structured programs 63

361 The crossover operator for tree structures 67362 The mutation operator for tree structures 67363 Other operators on tree structures 68

37 Symbolic regression 6938 Closure or strong typing 7039 Genetic programming as indirect search 72310 Search tuning 73

3101 Controlling program size 733102 Maintaining population diversity 74

4 Implementation 7641 Program structure 76

411 Formats and representations 77

vi

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 7: Tools for Modelling and Identification with Bond Graphs and

42 External tools 81421 The Python programming language 81422 The SciPy numerical libraries 82423 Graph layout and visualization with Graphviz 82

43 A bond graph modelling library 83431 Basic data structures 83432 Adding elements and bonds traversing a graph 84433 Assigning causality 84434 Extracting equations 86435 Model reductions 86

44 An object-oriented symbolic algebra library 100441 Basic data structures 100442 Basic reductions and manipulations 102443 Algebraic loop detection and resolution 104444 Reducing equations to state-space form 107445 Simulation 110

45 A typed genetic programming system 110451 Strongly typed mutation 112452 Strongly typed crossover 113453 A grammar for symbolic regression on dynamic systems 113454 A grammar for genetic programming bond graphs 118

5 Results and discussion 12051 Bond graph modelling and simulation 120

511 Simple spring-mass-damper system 120512 Multiple spring-mass-damper system 125513 Elastic collision with a horizontal surface 128514 Suspended planar pendulum 130515 Triple planar pendulum 133516 Linked elastic collisions 139

52 Symbolic regression of a static function 141521 Population dynamics 141522 Algebraic reductions 143

53 Exploring genetic neighbourhoods by perturbation of an ideal 15154 Discussion 153

541 On bond graphs 153542 On genetic programming 157

vii

6 Summary and conclusions 16061 Extensions and future work 161

611 Additional experiments 161612 Software improvements 163

viii

List of Figures

11 Using output prediction error to evaluate a model 412 The black box system identification problem 513 The grey box system identification problem 514 Block diagram of a Volterra function series model 1715 The Hammerstein filter chain model structure 1716 Block diagram of the linear squarer cuber model 1817 Block diagram for a nonlinear moving average operation 21

21 A bond denotes power continuity 3222 A capacitive storage element the 1-port capacitor C 3423 An inductive storage element the 1-port inertia I 3424 An ideal source element the 1-port effort source Se 3525 An ideal source element the 1-port flow source Sf 3526 A dissipative element the 1-port resistor R 3627 A power-conserving 2-port element the transformer TF 3728 A power-conserving 2-port element the gyrator GY 3729 A power-conserving multi-port element the 0-junction 38210 A power-conserving multi-port element the 1-junction 38211 A fully augmented bond includes a causal stroke 40212 Derivative causality electrical capacitors in parallel 44213 Derivative causality rigidly joined masses 45214 A power-conserving 2-port the modulated transformer MTF 47215 A power-conserving 2-port the modulated gyrator MGY 47216 A non-linear storage element the contact compliance CC 48217 Mechanical contact compliance an unfixed spring 48218 Effort-displacement plot for the CC element 49219 A suspended pendulum 50220 Sub-models of the simple pendulum 50

ix

221 Model of a simple pendulum using compound elements 51

31 A computer program is like a system model 5432 The crossover operation for bit string genotypes 5833 The mutation operation for bit string genotypes 5834 The replication operation for bit string genotypes 5935 Simplified GA flowchart 6136 Genetic programming produces a program 6637 The crossover operation for tree structured genotypes 6838 Genetic programming as indirect search 73

41 Model reductions step 1 remove trivial junctions 8842 Model reductions step 2 merge adjacent like junctions 8943 Model reductions step 3 merge resistors 9044 Model reductions step 4 merge capacitors 9145 Model reductions step 5 merge inertias 9146 A randomly generated model 9247 A reduced model step 1 9348 A reduced model step 2 9449 A reduced model step 3 95410 Algebraic dependencies from the original model 96411 Algebraic dependencies from the reduced model 97412 Algebraic object inheritance diagram 101413 Algebraic dependencies with loops resolved 108

51 Schematic diagram of a simple spring-mass-damper system 12352 Bond graph model of the system from figure 51 12353 Step response of a simple spring-mass-damper model 12454 Noise response of a simple spring-mass-damper model 12455 Schematic diagram of a multiple spring-mass-damper system 12656 Bond graph model of the system from figure 55 12657 Step response of a multiple spring-mass-damper model 12758 Noise response of a multiple spring-mass-damper model 12759 A bouncing ball 128510 Bond graph model of a bouncing ball with switched C-element 129511 Response of a bouncing ball model 129512 Bond graph model of the system from figure 219 132513 Response of a simple pendulum model 133

x

514 Vertical response of a triple pendulum model 136515 Horizontal response of a triple pendulum model 137516 Angular response of a triple pendulum model 137517 A triple planar pendulum 138518 Schematic diagram of a rigid bar with bumpers 139519 Bond graph model of the system from figure 518 140520 Response of a bar-with-bumpers model 140521 Fitness in run A 144522 Diversity in run A 144523 Program size in run A 145524 Program age in run A 145525 Fitness of run B 146526 Diversity of run B 146527 Program size in run B 147528 Program age in run B 147529 A best approximation of x2 + 20 after 41 generations 149530 The reduced expression 150531 Mean fitness versus tree-distance from the ideal 153532 Symbolic regression ldquoidealrdquo 154

xi

List of Tables

11 Modelling tasks in order of increasing difficulty 13

21 Units of effort and flow in several physical domains 3322 Causality options by element type 1- and 2-port elements 4223 Causality options by element type junction elements 43

41 Grammar for symbolic regression on dynamic systems 11742 Grammar for genetic programming bond graphs 119

51 Neighbourhood of a symbolic regression ldquoidealrdquo 152

xii

Listings

31 Protecting the fitness test code against division by zero 7141 A simple bond graph model in bg format 7842 A simple bond graph model constructed in Python code 8043 Two simple functions in Python 8144 Graphviz DOT markup for the model defined in listing 44 8345 Equations from the model in figure 46 9846 Equations from the reduced model in figure 49 9947 Resolving a loop in the reduced model 10648 Equations from the reduced model with loops resolved 107

xiii

Chapter 1

Introduction

11 System identification

System identification refers to the problem of finding models that describeand predict the behaviour of physical systems These are mathematical mod-els and because they are abstractions of the system under test they arenecessarily specialized and approximate

There are many uses for an identified model It may be used in simula-tion to design systems that interoperate with the modelled system a controlsystem for example In an iterative design process it is often desirable touse a model in place of the actual system if it is expensive or dangerous tooperate the original system An identified model may also be used onlinein parallel with the original system for monitoring and diagnosis or modelpredictive control A model may be sought that describes the internal struc-ture of a closed system in a way that plausibly explains the behaviour of thesystem giving insight into its working It is not even required that the systemexist An experimenter may be engaged in a design exercise and have onlyfunctional (behavioural) requirements for the system but no prototype oreven structural details of how it should be built Some system identificationtechniques are able to run against such functional requirements and generatea model for the final design even before it is realized If the identificationtechnique can be additionally constrained to only generate models that canbe realized in a usable form (akin to finding not just a proof but a genera-tive proof in mathematics) then the technique is also an automated designmethod [45]

1

In any system identification exercise the experimenter must specializethe model in advance by deciding what aspects of the system behaviour willbe represented in the model For any system that might be modelled froma rocket to a stone there are an inconceivably large number of environmentsthat it might be placed in and interactions that it might have with the en-vironment In modelling the stone one experimenter may be interested todescribe and predict its material strength while another is interested in itsspecular qualities under different lighting conditions Neither will (or should)concern themselves with the otherrsquos interests when building a model of thestone Indeed there is no obvious point at which to stop if either experi-menter chose to start including other aspects of a stonersquos interactions with itsenvironment One could imagine modelling a stonersquos structural resonancechemical reactivity and aerodynamics only to realize one has not modelledits interaction with the solar wind Every practical model is specialized ad-dressing itself to only some aspects of the system behaviour Like a 11 scalemap the only ldquocompleterdquo or ldquoexactrdquo model is a replica of the original sys-tem It is assumed that any experimenter engaged in system identificationhas reason to not use the system directly at all times

Identified models can only ever be approximate since they are based onnecessarily approximate and noisy measurements of the system behaviourIn order to build a model of the system behaviour it must first be observedThere are sources of noise and inaccuracy at many levels of any practicalmeasurement system Precision may be lost and errors accumulated by ma-nipulations of the measured data during the process of building a modelFinally the model representation itself will always be finite in scope and pre-cision In practice a modelrsquos accuracy should be evaluated in terms of thepurpose for which the model is built A model is adequate if it describes andpredicts the behaviour of the original system in a way that is satisfactory oruseful to the experimenter

In one sense the term ldquosystem identificationrdquo is nearly a misnomer sincethe result is at best an abstraction of some aspects of the system and not atall a unique identifier for it There can be many different systems which areequally well described in one aspect or another by a single model and therecan be many equally valid models for different aspects of a single systemOften there are additional equivalent representations for a model such as adifferential equation and its Laplace transform and tools exist that assist inmoving from one representation to another [28] Ultimately it is perhaps bestto think of an identified model as a second system that for some purposes

2

can be used in place of the system under test They are identical in someuseful aspect This view is most clear when using analog computers to solvesome set of differential equations or when using a software on a digitalcomputer to simulate another digital system ([59] contains many examplesthat use MATLAB to simulate digital control systems)

12 Black box and grey box problems

The black box system identification problem is specified as after observinga time sequence of input and measured output values find a model thatas closely as possible reproduces the output sequence given only the inputsequence In some cases the system inputs may be a controlled test signalchosen by the experimenter A typical experimental set-up is shown schemat-ically in figure 12 It is common and perhaps intuitive to evaluate modelsaccording to their output prediction error as shown schematically in figure11 Other formulations are possible but less often used Input predictionerror for example uses an inverse model to predict test signal values fromthe measured system output Generalized prediction error uses a two partmodel consisting of a forward model of input processes and an inverse modelof output processes their meeting corresponds to some point internal to thesystem under test [85]

Importantly the system is considered to be opaque (black) exposing onlya finite set of observable input and output ports No assumptions are madeabout the internal structure of the system Once a test signal enters thesystem at an input port it cannot be observed any further The origin ofoutput signals cannot be traced back within the system This inscrutableconception of the system under test is a simplifying abstraction Any twosystems which produce identical output signals for a given input signal areconsidered equivalent in a black box system identification By making noassumptions about the internal structure or physical principles of the systeman experimenter can apply familiar system identification techniques to a widevariety of systems and use the resulting models interchangeably

It is not often true however that nothing is know of the system inter-nals Knowledge of the system internals may come from a variety of sourcesincluding

bull Inspection or disassembly of a non-operational system

3

bull Theoretical principles applicable to that class of systems

bull Knowledge from the design and construction of the system

bull Results of another system identification exercise

bull Past experience with similar systems

In these cases it may be desirable to use an identification technique thatis able to use and profit from a preliminary model incorporating any priorknowledge of the structure of the system under test Such a technique isloosely called ldquogrey boxrdquo system identification to indicate that the systemunder test is neither wholly opaque nor wholly transparent to the experi-menter The grey box system identification problem is shown schematicallyin figure 13

Note that in all the schematic depictions above the measured signal avail-able to the identification algorithm and to the output prediction error calcu-lation is a sum of the actual system output and some added noise This addednoise represents random inaccuracies in the sensing and measurement appa-ratus as are unavoidable when modelling real physical processes A zeromean Gausian distribution is usually assumed for the added noise signalThis is representative of many physical processes and a common simplifyingassumption for others

Figure 11 Using output prediction error to evaluate a model

4

Figure 12 The black box system identification problem

Figure 13 The grey box system identification problem

5

13 Static and dynamic models

Models can be classified as either static or dynamic The output of a staticmodel at any instant depends only on the value of the inputs at that instantand in no way on any past values of the input Put another way the inputndashoutput relationships of a static model are independent of order and rateat which the inputs are presented to it This prohibits static models fromexhibiting many useful and widely evident behaviours such as hysteresis andresonance As such static models are not of much interest on their own butoften occur as components of a larger compound model that is dynamic overall The system identification procedure for static models can be phrasedin visual terms as fit a line (plane hyper-plane) curve (sheet) or somediscontinuous function to a plot of system inputndashoutput data The data maybe collected in any number of experiments and is plotted without regard tothe order in which it was collected Because there is no way for past inputs tohave a persistent effect within static models these are also sometimes calledldquostatelessrdquo or ldquozero-memoryrdquo models

Dynamic models have ldquomemoryrdquo in some form The observable effectsat their outputs of changes at their inputs may persist beyond subsequentchanges at their inputs A very common formulation for dynamic models iscalled ldquostate space formrdquo In this formulation a dynamic system is repre-sented by a set of input state and output variables that are related by a setof algebraic and first order ordinary differential equations The input vari-ables are all free variables in these equations Their values are determinedoutside of the model The first derivative of each state variable appears aloneon the left-hand side of a differential equation the right-hand side of which isan algebraic expression containing input and state variables but not outputvariables The output variables are defined algebraically in terms of the stateand input variables For a system with n input variables m state variablesand p output variables the model equations would be

u1 =f1(x1 xn u1 um)

um =fm(x1 xn u1 um)

6

y1 =g1(x1 xn u1 um)

yp =gp(x1 xn u1 um)

where x1 xn are the input variables u1 um are the state variablesand y1 yp are the output variables

The main difficulties in identifying dynamic models (over static ones) arefirst because the data order of arrival within the input and output signalsmatter typically much more data must be collected and second that thesystem may have unobservable state variables Identification of dynamicmodels typically requires more data to be collected since to adequate sam-ple the behaviour of a stateful system not just every interesting combinationof the inputs but ever path through the interesting combinations of the in-puts must be exercised Here ldquointerestingrdquo means simply ldquoof interest to theexperimenterrdquo

Unobservable state variables are state variables that do not coincide ex-actly with an output variable That is a ui for which there is no yj =gj(x1 xn u1 um) = ui The difficulty in identifying systems with un-observable state variables is twofold First there may be an unknown numberof unobservable state variables concealed within the system This leaves theexperimenter to estimate an appropriate model order If this estimate is toolow then the model will be incapable of reproducing the full range of systembehaviour If this estimate is too high then extra computation costs will beincurred during identification and use of the model and there is an increasedrisk of over-fitting (see section 16) Second for any system with an unknownnumber of unobservable state variables it is impossible to know for certainthe initial state of the system This is a difficulty to experimenters since foran extreme example it may be futile to exercise any path through the inputspace during a period when the outputs are overwhelmingly determined bypersisting effects of past inputs Experimenters may attempt to work aroundthis by allowing the system to settle into a consistent if unknown state be-fore beginning to collect data In repeated experiments the maximum timebetween application of a stationary input signal and the arrival of the sys-tem outputs at a stationary signal is recorded Then while collecting datafor model evaluation the experimenter will discard the output signal for atleast as much time between start of excitation and start of data collection

7

14 Parametric and non-parametric models

Models can be divided into those described by a small number of parametersat distinct points in the model structure called parametric models and thosedescribed by an often quite large number of parameters all of which havesimilar significance in the model structure called non-parametric models

System identification with parametric model entails choosing a specificmodel structure and then estimating the model parameters to best fit modelbehaviour to system behaviour Symbolic regression (discussed in section 37)is unusual because it is able to identify the model structure automaticallyand the corresponding parameters at the same time

Parametric models are typically more concise (contain fewer parametersand are shorter to describe) than similarly accurate non-parametric modelsParameters in a parametric model often have individual significance in thesystem behaviour For example the roots of the numerator and denominatorof a transfer function model indicate poles and zeros that are related tothe settling time stability and other behavioural properties of the modelContinuous or discrete transfer functions (in the Laplace or Z domain) andtheir time domain representations (differential and difference equations) areall parametric models Examples of linear structures in each of these formsis given in equations 11 (continuous transfer function) 12 (discrete transferfunction) 13 (continuous time-domain function) and 14 (discrete time-domain function) In each case the experimenter must choose m and n tofully specify the model structure For non-linear models there are additionalstructural decisions to be made as described in Section 15

Y (s) =bmsm + bmminus1s

mminus1 + middot middot middot+ b1s + b0

sn + anminus1snminus1 + middot middot middot+ a1s + a0

X(s) (11)

Y (z) =b1z

minus1 + b2zminus2 + middot middot middot+ bmzminusm

1 + a1zminus1 + a2zminus2 + middot middot middot+ anzminusnX(z) (12)

dny

dtn+ anminus1

dnminus1y

dtnminus1+ middot middot middot+ a1

dy

dt+ a0y = bm

dmx

dtm+ middot middot middot+ b1

dx

dt+ b0x (13)

yk + a1ykminus1 + middot middot middot+ anykminusn = b1xkminus1 + middot middot middot+ bmxkminusm (14)

Because they typically contain fewer identified parameters and the rangeof possibilities is already reduced by the choice of a specific model structure

8

parametric models are typically easier to identify provided the chosen spe-cific model structure is appropriate However system identification with aninappropriate parametric model structure particularly a model that is toolow order and therefore insufficiently flexible is sure to be frustrating Itis an advantage of non-parametric models that the experimenter need notmake this difficult model structure decision

Examples of non-parametric model structures include impulse responsemodels (equation 15 continuous and 16 discrete) frequency response models(equation 17) and Volterra kernels Note that equation 17 is simply theFourier transform of equation 15 and that equation 16 is the special case ofa Volterra kernel with no higher order terms

y(t) =

int t

0

u(tminus τ)x(τ)dτ (15)

y(k) =ksum

m=0

u(k minusm)x(m) (16)

Y (jω) = U(jω)X(jω) (17)

The lumped-parameter graph-based models discussed in chapter 2 arecompositions of sub-models that may be either parametric or non-parametricThe genetic programming based identification technique discussed in chapter3 relies on symbolic regression to find numerical constants and algebraicrelations not specified in the prototype model or available graph componentsAny non-parametric sub-models that appear in the final compound modelmust have been identified by other techniques and included in the prototypemodel using a grey box approach to identification

15 Linear and nonlinear models

There is a natural ordering of linear parametric models from low-order modelsto high-order models That is linear models of a given structure they can beenumerated from less complex and flexible to more complex and flexible Forexample models structured as a transfer function (continuous or discrete)can be ordered by the number of poles first and by the number of zeros formodels with the same number of poles Models having a greater number ofpoles and zeros are more flexible that is they are able to reproduce more

9

complex system behaviours than lower order models Two or more modelshaving the same number of poles and zeros would be considered equallyflexible

Linear here means linear-in-the-parameters assuming a parametric modelA parametric model is linear-in-the-parameters if for input vector u outputvector y and parameters to be identified β1 βn the model can be writtenas a polynomial containing non-linear functions of the inputs but not of theparameters For example equation 18 is linear in the parameters βi whileequation 18 is not Both contain non-linear functions of the inputs (namelyu2

2 and sin( u3)) but equation 18 forms a linear sum of the parameterseach multiplied by some factor that is unrelated in any way to the parame-ters This structure permits the following direct solution to the identificationproblem

y = β1u1 + β2u22 + β3 sin(u3) (18)

y = β1β2u1 + u22 + sin(β3u3) (19)

For models that are linear-in-the-parameters the output prediction erroris also a linear function of the parameters A best fit model can be founddirectly (in one step) by taking the partial derivative of the error functionwith respect to the parameters and testing each extrema to solve for theparameters vector that minimizes the error function When the error functionis a sum of squared error terms this technique is called a linear least squaresparameter estimation In general there are no such direct solutions to theidentification problem for non-linear model types Identification techniquesfor non-linear models all resort to an iterative multi-step process of somesort These iterations can be viewed as a search through the model parameterand in some cases model structure space There is no guarantee of findinga globally optimal model in terms of output prediction error

There is a fundamental trade-off between model size or order and themodelrsquos expressive power or flexibility In general it is desirable to identifythe most compact model that adequately represents the system behaviourLarger models incur greater storage and computational costs during identi-fication and use Excessively large models are also more vulnerable to overfitting (discussed in section 16) a case of misapplied flexibility

One approach to identifying minimal-order linear models is to start withan arbitrarily low order model structure and repeatedly identify the best

10

model using the linear least squares technique discussed below while incre-menting the model order between attempts The search is terminated assoon as an adequate representation of system behaviour is achieved Unfor-tunately this approach or any like it are not applicable to general non-linearmodels since any number of qualitatively distinct yet equally complex struc-tures are possible

One final complication to the identification of non-linear models is thatsuperposition of separately measured responses applies to linear models onlyOne can not measure the output of a non-linear system in response to distincttest signals on different occasions and superimpose the measurements latterwith any accuracy This prohibits conveniences such as adding a test signalto the normal operating inputs and subtracting the normal operating outputsfrom the measured output to estimate the system response to the test signalalone

16 Optimization search machine learning

System identification can be viewed as a type of optimization or machinelearning problem In fact most system identification techniques draw heavilyfrom the mathematics of optimization and in as much as the ldquobestrdquo modelstructure and parameter values are sought system identification is an opti-mization problem When automatic methods are desired system identifica-tion is a machine learning problem Finally system identification can also beviewed as a search problem where a ldquogoodrdquo or ldquobestrdquo model is sought withinthe space of all possible models

Overfitting and premature optimization are two pitfalls common to manymachine learning search and global optimization problems They might alsobe phrased as the challenge of choosing a rate at which to progress and ofdeciding when to stop The goal in all cases is to generalize on the trainingdata find a global optimum and to not give too much weight to unimportantlocal details

Premature optimization occurs when a search algorithm narrows in on aneighbourhood that is locally but not globally optimal Many system iden-tification and machine learning algorithms incorporate tunable parametersthat adjust how aggressive the algorithm is in pursuing a solution In machinelearning this is called a learning rate and controls how fast the algorithmwill accept new knowledge and discard old In heuristic search algorithms it

11

is a factor that trades off exploration of new territory with exploitation oflocal gain made in a a neighbourhood Back propagation learning for neuralnetworks encodes the learning rate in a single constant factor In fuzzy logicsystems it is implicit in the fuzzy rule adjustment procedures Learning ratecorresponds to step size in a hill climbing or simulated annealing search andin (tree structured) genetic algorithms the representation (function and ter-minal set) determines the topography of the search space and the operatordesign (crossover and mutation functions) determine the allowable step sizesand directions

Ideally the identified model will respond to important trends in the dataand not to random noise or aspects of the system that are uninteresting tothe experimenter Successful machine learning algorithms generalize on thetraining data and good identified models give good results to unseen testsignals as well as those used during identification A model which does notgeneralize well may be ldquooverfittingrdquo To check for overfitting a model isvalidated against unseen data This data must be held aside strictly for val-idation purposes If the validation results are used to select a best modelthen the validation data implicitly becomes training data Stopping criteriatypically include a quality measure (a minimum fitness or maximum errorthreshold) and computational measures (a maximum number of iterations oramount of computer time) Introductory descriptions of genetic algorithmsand back propagation learning in neural networks mention both criteria anda ldquofirst metrdquo arrangement The Bayesian and Akaike information criteria aretwo heuristic stopping criteria mentioned in every machine learning textbookThey both incorporate a measure of parsimony for the identified model andweight that against the fitness or error measure They are therefore appli-cable to techniques that generate a variable length model the complexity orparsimony of which is easily quantifiable

17 Some model types and identification tech-

niques

Table 11 divides all models into four classes labelled in order of increasingdifficulty of their identification problem The linear problems each have wellknown solution methods that achieve an optimal model in one step Forlinear static models (class 1 in table 11) the problem is to fit a line plane

12

or hyper-plane that is ldquobestrdquo by some measure (such as minimum squaredor absolute error) to the inputndashoutput data The usual solution method isto form the partial derivative of the error measure with respect to the modelparameters (slope and offset for a simple line) and solve for model parametervalues at the minimum error extremum The solution method is identical forlinear-in-the-parameters dynamic models (class 2 in table 11) Non-linearstatic modelling (class 3 in table 11) is the general function approximationproblem (also known as curve fitting or regression analysis) The fourth classof system identification problems identifying models that are both non-linearand dynamic is the most challenging and the most interesting Non-lineardynamic models are the most difficult to identify because in general itis a combined structure and parameter identification problem This thesisproposes a techniquendashbond graph models created by genetic programmingndashtoaddress black- and grey-box problems of this type

Table 11 Modelling tasks in order of increasing difficulty

linear non-linearstatic (1) (3)

dynamic (2) (4)

(1) (2) are both linear regression problems and (3) is the general functionapproximation problem (4) is the most general class of system identificationproblems

Two models using local linear approximations of nonlinear systems arediscussed first followed by several nonlinear nonparametric and parametricmodels

171 Locally linear models

Since nonlinear system identification is difficult two categories of techniqueshave arisen attempting to apply linear models to nonlinear systems Bothrely on local linear approximations of nonlinear systems which are accuratefor small changes about a fixed operating point In the first category severallinear model approximations are identified each at different operating pointsof the nonlinear system The complete model interpolates between these locallinearizations An example of this technique is the Linear Parameter-Varying

13

(LPV) model In the second category a single linear model approximation isidentified at the current operating point and then updated as the operatingpoint changes The Extended Kalman Filter (EKF) is an example of thistechnique

Both these techniques share the advantage of using linear model identifi-cation which is fast and well understood LPV models can be very effectiveand are popular for industrial control applications [12] [7] For a given modelstructure the extended Kalman filter estimates state as well as parametersIt often used to estimate parameters of a known model structure given noisyand incomplete measurements (where not all states are measurable) [47] [48]The basic identification technique for LPV models consists of three steps

1 Choose the set of base operating points

2 Perform a linear system identification at each of the base operatingpoints

3 Choose an interpolation scheme (eg linear polynomial or b-spline)

In step 2 it is important to use low amplitude test signals such that the systemwill not be excited outside a small neighbourhood of the chosen operatingpoint for which the system is assumed linear The corollary to this is thatwith any of the globally nonlinear models discussed later it is important touse a large test signal which excites the nonlinear dynamics of the systemotherwise only the local approximately linear behaviour will be modelled[5] presents a one-shot procedure that combines all of the three steps above

Both of these techniques are able to identify the parameters but not thestructure of a model For EKF the model structure must be given in astate space form For LPV the model structure is defined by the number andspacing of the base operating points and the interpolation method Recursiveor iterative forms of linear system identification techniques can track (updatethe model online) mildly nonlinear systems that simply drift over time

172 Nonparametric modelling

Volterra function series models and artificial neural networks are two gen-eral nonparametric nonlinear models about which quite a lot has been writ-ten Volterra models are linear in the parameters but they suffer from anextremely large number of parameters which leads to computational diffi-culties and overfitting Dynamic system identification can be posed as a

14

function approximation problem and neural networks are a popular choiceof approximator Neural networks can also be formed in so-called recurrentconfigurations Recurrent neural networks are dynamic systems themselves

173 Volterra function series models

Volterra series models are an extension of linear impulse response models(first order time-domain kernels) to nonlinear systems The linear impulseresponse model is given by

y(t) = y1(t) =

int 0

minusTS

g(tminus τ) middot u(τ)dτ (110)

Its discrete-time version is

y(tk) = y1(tk) =

NSsumj=0

Wj middot ukminusj (111)

The limits of integration and summation are finite since practical stablesystems are assumed to have negligible dependence on the very distant pastThat is only finite memory systems are considered The Volterra seriesmodel extends the above into an infinite sum of functions The first of thesefunctions is the linear model y1(t) above

y(t) = y1(t) + y2(t) + y3(t) + (112)

The continuous-time form of the additional terms (called bilinear trilin-ear etc functions) is

y2(t) =

int 0

minusT1

int 0

minusT2

g2(tminus τ1 tminus τ2) middot u(τ1) middot u(τ2)dτ1dτ2

y3(t) =

int 0

minusT1

int 0

minusT2

int 0

minusT3

g3(tminus τ1 tminus τ2 tminus τ3) middot u(τ1) middot u(τ2) middot u(τ3)dτ1dτ2dτ3

yp(t) =

int 0

minusT1

int 0

minusT2

middot middot middotint 0

minusTp

gp(tminus τ1 tminus τ2 middot middot middot tminus τp) middot u(τ1) middot u(τ2) middot middot middot

middot middot middot middot u(τp)dτ1dτ2 middot middot middot dτp

15

Where gp is called the pth order continuous time kernel The rest of thisdiscussion will use the discrete-time representation

y2(tn) =

M2sumi=0

M2sumj=0

W(2)ij middot unminusi middot unminusj

y3(tn) =

M3sumi=0

M3sumj=0

M3sumk=0

W(3)ijk middot unminusi middot unminusj middot unminusk

yp(tn) =

Mpsumi1=0

Mpsumi2=0

middot middot middotMpsum

ip=0

W(p)i1i2ip

middot unminusi1 middot unminusi2 middot middot middot middot unminusip

W (p) is called the prsquoth order discrete time kernel W (2) is a M2 by M2

square matrix and W (3) is a M3 by M3 by M3 cubic matrix and so on It is notrequired that the memory lengths of each kernel (M2 M3 etc) are the samebut they are typically chosen equal for consistent accounting If the memorylength is M then the prsquoth order kernel W (p) is a p dimensional matrix withMp elements The overall Volterra series model has the form of a sum ofproducts of delayed values of the input with constant coefficients comingfrom the kernel elements The model parameters are the kernel elementsand the model is linear in the parameters The number of parameters isM(1) + M2

(2) + M3(3) + middot middot middot + Mp

(p) + or infinite Clearly the full Volterraseries with an infinite number of parameters cannot be used in practicerather it is truncated after the first few terms In preparing this thesis noreferences were found to applications using more than 3 terms in the Volterraseries The number of parameters is still large even after truncating at thethird order term A block diagram of the full Volterra function series modelis shown in figure 14

174 Two special models in terms of Volterra series

Some nonlinear systems can be separated into a static nonlinear function fol-lowed by a linear time-invariant (LTI) dynamic system This form is calleda Hammerstein model and is shown in figure 15 The opposite an LTI dy-namic system followed by a static nonlinear function is called a Weiner model

16

Figure 14 Block diagram of a Volterra function series model

This section discusses the Hammerstein model in terms of Volterra series andpresents another simplified Volterra-like model recommended in [6]

Figure 15 The Hammerstein filter chain model structure

Assuming the static nonlinear part is a polynomial in u and given thelinear dynamic part as an impulse response weighting sequence W of lengthN

v = a0 + a1u + a2u2 + middot middot middot+ aqu

q (113)

y(tk) =Nsum

j=0

Wj middot vkminusj (114)

Combining these gives

y(tk) =Nsum

j=0

Wj

qsumi=0

akuki

=

qsumi=0

ak

Nsumj=0

Wjuki

= a0

Nsumj=0

Wj + a1

Nsumj=0

Wju1 + a2

Nsumj=0

Wju22 + middot middot middot+ aq

Nsumj=0

Wjuqq

17

Comparing with the discrete time Volterra function series shows that theHammerstein filter chain model is a discrete Volterra model with zero forall the off-diagonal elements in higher order kernels Elements on the maindiagonal of higher order kernels can be interpreted as weights applied topowers of the input signal

Bendat [6] advocates using a simplification of the Volterra series modelthat is similar to the special Hammerstein model described above It issuggested to truncate the series at 3 terms and to replace the bilinear andtrilinear operations with a square- and cube-law static nonlinear map followedby LTI dynamic systems This is shown in figure 16 Obviously it is lessgeneral than the system in figure 14 not only because it is truncated atthe third order term but also because like the special Hammerstein modelabove it has no counterparts to the off-diagonal elements in the bilinearand trilinear Volterra kernels In [6] it is argued that the model in figure16 is still adequate for many engineering systems and examples are givenin automotive medical and naval applications including a 6-DOF modelfor a moored ship The great advantage of this model form is that thesignals u1 = u u2 = u2 and u3 = u3 can be calculated which reduces theidentification of the dynamic parts to a multi-input multi-output (MIMO)linear problem

Figure 16 Block diagram of the linear squarer cuber model

175 Least squares identification of Volterra series mod-els

To illustrate the ordinary least squares formulation of system identificationwith Volterra series models just the first 2 terms (linear and bilinear) will

18

be used

y(tn) = y1(tn) + y2(tn) + e(tn) (115)

y =

N1sumi=0

W 1i middot unminusi +

N2sumi=0

N2sumj=0

W(2)ij middot unminusi middot unminusj + e(tn) (116)

= (u1)T W (1) + (u1)

T W (2)u2 + +e (117)

Where the vector and matrix quantities u1 u2 W (1) and W (2) are

u1 =

unminus1

unminus2

unminusN1

u2 =

unminus1

unminus2

unminusN2

W (1) =

W

(1)1

W(1)2

W(1)N1

W (2) =

W(2)11 W

(2)1N2

W(2)N11 W

(2)N1N2

Equation 117 can be written in a form (equation 118) amenable to solu-

tion by the method of least squares if the input vector U and the parametervector β are constructed as follows The first N1 elements in U are the ele-ments of u1 and this is followed by elements having the value of the productsof u2 appearing in the second term of equation 116 The parameter vectorβ is constructed from elements of W (1) followed by products of the elementsof W (2) as they appear in the second term of equation 116

y = Uβ + e (118)

Given a sequence of N measurements U and β become rectangular andy and e become N by 1 vectors The ordinary least squares solution is foundby forming the pseudo-inverse of U

β = (UT U)minus1Uy (119)

The ordinary least squares formulation has the advantage of not requiringa search process However the number of parameters is large O(NP ) for amodel using the first P Volterra series terms and a memory length of N

19

A slight reduction in the number of parameters is available by recognizingthat not all elements in the higher order kernels are distinguishable In thebilinear kernel for example there are terms of the form Wij(unminusi)(unminusj) andWji(unminusj)(unminusi) However since (unminusi)(unminusj) = (unminusj)(unminusi) the parame-ters Wij and Wji are identical Only N(N +1)2 unique parameters need tobe identified in the bilinear kernel but this reduction by approximately halfleaves a kernel size that is still of the same order O(NP )

One approach to reducing the number of parameters in a Volterra model isto approximate the kernels with a series of lower order functions transform-ing the problem from estimating kernel parameters to estimating a smallernumber of parameters to the basis functions The kernel parameters can thenbe read out by evaluating the basis functions at N equally spaced intervalsTreichl et al [81] use a weighted sum of distorted sinus functions as orthonor-mal basis functions (OBF) to approximate Volterra kernels The parametersto this formula are ζ a form factor affecting the shape of the sinus curve jwhich iterates over component sinus curves and i the read-out index Thereare mr sinus curves in the weighted sum The values of ζ and mr are designchoices The read-out index varies from 1 to N the kernel memory lengthIn the new identification problem only the weights wj applied to each OBFin the sum must be estimated This problem is also linear in the parametersand the number of parameters has been reduced to mr For example if Nis 40 and mr is 10 then there is a 4 times compression of the number ofparameters in the linear kernel

Another kernel compression technique is based on the discrete wavelettransform (DWT) Nikolaou and Mantha [53] show the advantages of awavelet based compression of the Volterra series kernels while modelling achemical process in simulation Using the first two terms of the Volterraseries model with N1 = N2 = 32 gives the uncompressed model a total ofN1 + N2(N2 + 1)2 = 32 + 528 = 560 parameters In the wavelet domainidentification only 18 parameters were needed in the linear kernel and 28 inthe bilinear kernel for a total of 46 parameters The 1- and 2-D DWT usedto compress the linear and bilinear kernels respectively are not expensive tocalculate so the compression ratio of 56046 results in a much faster identifi-cation Additionally it is reported that the wavelet compressed model showsmuch less prediction error on validation data not used in the identificationThis is attributed to overfitting in the uncompressed model

20

176 Nonparametric modelling as function approxi-mation

Black box models identified from inputoutput data can be seen in verygeneral terms as mappings from inputs and delayed values of the inputs tooutputs Boyd and Chua [13] showed that ldquoany time invariant operationwith fading memory can be approximated by a nonlinear moving averageoperatorrdquo This is depicted in figure 17 where P ( ) is a polynomial in uand M is the model memory length (number of Zminus1 delay operators) There-fore one way to look at nonlinear system identification is as the problem offinding the polynomial P The arguments to P uZminus1 uZminus2 uZminusM can be calculated What is needed is a general and parsimonious functionapproximator The Volterra function series is general but not parsimoniousunless the kernels are compressed

Figure 17 Block diagram for a nonlinear moving average operation

Juditsky et al argue in [38] that there is an important difference betweenuniformly smooth and spatially adaptive function approximations Kernelswork well to approximate functions that are uniformly smooth A functionis not uniformly smooth if it has important detail at very different scalesFor example if a Volterra kernel is smooth except for a few spikes or stepsthen the kernel order must be increased to capture these details and theldquocurse of dimensionalityrdquo (computational expense) is incurred Also theresulting extra flexibility of the kernel in areas where the target functionis smooth may lead to overfitting In these areas coefficients of the kernelapproximation should be nearly uniform for a good fit but the kernel encodesmore information than is required to represent the system

21

Orthonormal basis functions are a valid compression of the model param-eters that form smooth low order representations However since commonchoices such as distorted sinus functions are combined by weighted sums andare not offset or scaled they are not able to adapt locally to the smoothnessof the function being approximated Wavelets which are used in many othermulti-resolution applications are particularly well suited to general functionapproximation and the analytical results are very complete [38] Sjoberg etal [73] distinguish between local basis functions whose gradient vanishesrapidly at infinity and global basis functions whose variations are spreadacross the entire domain The Fourier series is given as an example of aglobal basis function wavelets are a local basis function Local basis func-tions are key to building a spatially adaptive function approximator Sincethey are essentially constant valued outside of some interval their effect canbe localized

177 Neural networks

Multilayer feed-forward neural networks have been called a ldquouniversal ap-proximatorrdquo in that they can approximate any measurable function [35]Here at the University of Waterloo there is a project to predict the positionof a polishing robot using delayed values of the position set point from 3consecutive time steps as inputs to a neural network In the context of fig-ure 17 this is a 3rd order non-linear moving average with a neural networktaking the place of P

Most neural network applications use local basis functions in the neuronactivations Hofmann et al [33] identify a Hammerstein-type model usinga radial basis function network followed by a linear dynamic system Scottand Mlllgrew [70] develop a multilayer feed-forward neural network usingorthonormal basis functions and show its advantages over radial basis func-tions (whose basis centres must be fixed) and multilayer perceptron networksThese advantages appear to be related to the concept of spatial adaptabilitydiscussed above

The classical feed-forward multilayer perceptron (MLP) networks aretrained using a gradient descent search algorithm called back-propagationlearning (BPL) For radial basis networks this is preceded by a clusteringalgorithm to choose the basis centres or they may be chosen manually Oneother network configuration is the so-called recurrent or dynamic neural net-work in which the network output or some intermediate values are used also

22

as inputs creating a feedback loop There are any number of ways to con-figure feedback loops in neural networks Design choices include the numberand source of feedback signals and where (if at all) to incorporate delay ele-ments Recurrent neural networks are themselves dynamic systems and justfunction approximators Training of recurrent neural networks involves un-rolling the network over several time steps and applying the back propagationlearning algorithm to the unrolled network This is even more computation-ally intensive than the notoriously intensive training of simple feed-forwardnetworks

178 Parametric modelling

The most familiar parametric model is a differential equation since this isusually the form of model created when a system is modelled from first prin-ciples Identification methods for differential and difference equation modelsalways require a search process The process is called equation discovery orsometimes symbolic regression In general the more qualitative or heuristicinformation that can be brought to bear on reducing the search space thebetter Some search techniques are in fact elaborate induction algorithmsothers incorporate some generic heuristics called information criteria Fuzzyrelational models can conditionally be considered parametric Several graph-based models are discussed including bond graphs which in some cases aresimply graphical transcriptions of differential equations but can incorporatearbitrary functions

179 Differential and difference equations

Differential equations are very general and are the standard representationfor models not identified but derived from physical principles ldquoEquationdiscoveryrdquo is the general term for techniques which automatically identify anequation or equations that fit some observed data Other model forms suchas bond graph are converted into a set of differential and algebraic equa-tions prior to simulation If no other form is needed then an identificationtechnique that produces such equations immediately is perhaps more conve-nient If a system identification experiment is to be bootstrapped (grey-boxfashion) from an existing model then the existing model usually needs to bein a form compatible with what the identification technique will produce

23

Existing identification methods for differential equations are called ldquosym-bolic regressionrdquo and ldquoequation discoveryrdquo The first usually refers to meth-ods that rely on simple fitness guided search algorithms such as genetic pro-gramming ldquoevolution strategierdquo or simulated annealing The second termhas been used to describe more constraint-based algorithms These use qual-itative constraints to reduce the search space by ruling out whole classes ofmodels by their structure or the acceptable range of their parameters Thereare two possible sources for these constraints

1 They may be given to the identification system from prior knowledge

2 They may be generated online by an inference system

LAGRAMGE ([23] [24] [78] [79]) is an equation discovery program thatsearches for equations which conform to a context free grammar specifiedby the experimenter This grammar implicitly constrains the search to per-mit only a (possibly very small) class of structures The challenge in thisapproach is to write meaningful grammars for differential equations gram-mars that express a priori knowledge about the system being modelled Oneapproach is to start with a grammar that is capable of producing only oneunique equation and then modify it interactively to experiment with differ-ent structures Todorovski and Dzeroski [80] describe a ldquominimal changerdquoheuristic for limiting model size It starts from an existing model and favoursvariations which both reduce the prediction error and are structurally similarto the original This is a grey-box identification technique

PRET ([74] [75] [14]) is another equation discovery program that incor-porates expert knowledge in several domains collected during its design It isable to rule out whole classes of equations based on qualitative observationsof the system under test For example if the phase portrait has certain geo-metrical properties then the system must be chaotic and therefore at least asecond order differential equation Qualitative observations can also suggestcomponents which are likely to be part of the model For example outputsignal components at twice the input signal frequency suggest there may bea squared term in the equation PRET performs as much high level workas possible eliminating candidate model classes before resorting to numericalsimulations

Ultimately if the constraints do not specify a unique model all equationdiscovery methods resort to a fitness guided local search The first book in theGenetic Programming series [43] discusses symbolic regression and empirical

24

discovery (yet another term for equation discovery) giving examples fromeconometrics and mechanical dynamics The choice of function and terminalsets for genetic programming can be seen as externally provided constraintsAdditionally although Koza uses a typeless genetic programming systemso-called ldquostrongly typedrdquo genetic programming provides constraints on thecrossover and mutation operations

Saito et al [69] present two unusual algorithms The RF5 equationdiscovery algorithm transforms polynomial equations into neural networkstrains the network with the standard back-propagation learning algorithmthen converts the trained network back into a polynomial equation TheRF6 discovery algorithm augments RF5 using a clustering technique anddecision-tree induction

One last qualitative reasoning system is called General Systems ProblemSolving [42] It describes a taxonomy of systems and an approach to identify-ing them It is not clear if general systems problem solving has been appliedand no references beyond the original author were found

1710 Information criteriandashheuristics for choosing modelorder

Information criteria are heuristics that weight model fit (reduction in errorresiduals) against model size to evaluate a model They try to guess whethera given model is sufficiently high order to represent the system under testbut not so flexible as to be likely to over fit These heuristics are not areplacement for the practice of building a model with one set of data andvalidating it with a second The Akaike and Bayesian information criteriaare two examples The Akaike Information Criteria (AIC) is

AIC = minus2 ln(L) + 2k (120)

where L is the likelihood of the model and k is the order of the model(number of free parameters) The likelihood of a model is the conditionalprobability of a given output u given the model M and model parametervalues U

L = P (u|M U) (121)

Bayesian Information Criteria (BIC) considers the size of the identificationdata set reasoning that models with a large number of parameters but which

25

were identified from a small number of observations are very likely to be overfit and have spurious parameters The Bayes information criterion is

BIC = minus2 ln(L) + k ln(n) (122)

where n is the number of observations (ie sample size) The likelihoodof a model is not always known however and so must be estimated Onesuggestion is to use a modified version based on the root mean square error(RMSE) instead of the model likelihood [3]

AIC sim= minus2n ln(RMSE

n) + 2k (123)

Cinar [17] reports on building In building the nonlinear polynomial mod-els in by increasing the number of terms until the Akaike information criteria(AIC) is minimized This works because there is a natural ordering of thepolynomial terms from lowest to highest order Sjoberg et al [73] point outthat not all nonlinear models have a natural ordering of their components orparameters

1711 Fuzzy relational models

According to Sjoberg et al [73] fuzzy set membership functions and infer-ence rules can form a local basis function approximation (the same generalclass of approximations as wavelets and splines) A fuzzy relational modelconsists of two parts

1 input and output fuzzy set membership functions

2 a fuzzy relation mapping inputs to outputs

Fuzzy set membership functions themselves may be arbitrary nonlinear-ities (triangular trapezoidal and Gaussian are common choices) and aresometime referred to as ldquolinguistic variablesrdquo if the degree of membership ina set has some qualitative meaning that can be named This is the sensein which fuzzy relational models are parametric the fuzzy set membershipfunctions describe the degrees to which specific qualities of the system aretrue For example in a speed control application there may be linguisticvariables named ldquofastrdquo and ldquoslowrdquo whose membership function need not bemutually exclusive However during the course of the identification method

26

described below the centres of the membership functions will be adjustedThey may be adjusted so far that their original linguistic values are not ap-propriate (for example if ldquofastrdquo and ldquoslowrdquo became coincident) In that caseor when the membership functions or the relations are initially generatedby a random process then fuzzy relational models should be considered non-parametric by the definition given in section 14 above In the best casehowever fuzzy relational models have excellent explanatory power since thelinguistic variables and relations can be read out in qualitative English-likestatements such as ldquowhen the speed is fast decrease the power outputrdquo oreven ldquowhen the speed is somewhat fast decrease the power output a littlerdquo

Fuzzy relational models are evaluated to make a prediction in 4 steps

1 The raw input signals are applied to the input membership functionsto determine the degree of compatibility with the linguistic variables

2 The input membership values are composed with the fuzzy relation todetermine the output membership values

3 The output membership values are applied to the output variable mem-bership functions to determine a degree of activation for each

4 The activations of the output variables is aggregated to produce a crispresult

Composing fuzzy membership values with a fuzzy relation is often writ-ten in notation that makes it look like matrix multiplication when in fact itis something quite different Whereas in matrix multiplication correspond-ing elements from one row and one column are multiplied and then thoseproducts are added together to form one element in the result compositionof fuzzy relations uses other operations in place of addition and subtractionCommon choices include minimum and maximum

An identification method for fuzzy relational models is provided by Brancoand Dente [15] along with its application to prediction of a motor drive sys-tem An initial shape is assumed for the membership functions as are initialvalues for the fuzzy relation Then the first input datum is applied to theinput membership functions mapped through the fuzzy relation and outputmembership functions to generate an output value from which an error ismeasured (distance from the training output datum) The centres of themembership functions are adjusted in proportion to the error signal inte-grated across the value of the immediate (pre-aggregation) value of that

27

membership function This apportioning of error or ldquoassignment of blamerdquoresembles a similar step in the back-propagation learning algorithm for feed-forward neural networks The constant of proportionality in this adjustmentis a learning rate too large and the system may oscillate and not convergetoo slow and identification takes a long time All elements of the relationare updated using the adjusted fuzzy input and output variables and therelation from the previous time step

Its tempting to view the membership functions as nonlinear input andoutput maps similar to a combined Hammerstein-Wiener model but com-position with the fuzzy relation in the middle differs in that max-min com-position is a memoryless nonlinear operation The entire model therefore isstatic-nonlinear

1712 Graph-based models

Models developed from first principles are often communicated as graphstructures Familiar examples include electrical circuit diagrams and blockdiagrams or signal flow graphs for control and signal processing Bond graphs([66] [39] [40]) are a graph-based model form where the edges (called bonds)denote energy flow between components The nodes (components) maybe sources dissipative elements storage elements or translating elements(transformers and gyrators optionally modulated by a second input) Sincethese graph-based model forms are each already used in engineering commu-nication it would be valuable to have an identification technique that reportsresults in any one of these forms

Graph-based models can be divided into 2 categories direct transcrip-tions of differential equations and those that contain more arbitrary compo-nents The first includes signal flow graphs and basic bond graphs The sec-ond includes bond graphs with complex elements that can only be describednumerically or programmatically Bond graphs are described in great detailin chapter 2

The basic bond graph elements each have a constitutive equation that isexpressed in terms of an effort variable (e) and a flow variable (f) which to-gether quantify the energy transfer along connecting edges The constitutiveequations of the basic elements have a form that will be familiar from linearcircuit theory but they can also be used to build models in other domainsWhen all the constitutive equations for each node in a graph are writtenout with nodes linked by a bond sharing their effort and flow variables the

28

result is a coupled set of differential and algebraic equations The consti-tutive equations of bond graph nodes need not be simple integral formulaeArbitrary functions may be used so long as they are amenable to whateveranalysis or simulation the model is intended In these cases the bond graphmodel is no longer a simply a graphic transcription of differential equations

There are several commercial software packages for bond graph simulationand bond graphs have been used to model some large and complex systemsincluding a 10 degree of freedom planar human gait model developed atthe University of Waterloo [61] A new software package for bond graphsimulation is described in chapter 4 and several example simulations arepresented in section 51

Actual system identification using bond graphs requires a search throughcandidate models This is combined structure and parameter identificationThe structure is embodied in the types of elements (nodes) that are includedin the graph and the interconnections between them (arrangement of theedges or bonds) The genetic algorithm and genetic programming have bothbeen applied to search for bond graph models Danielson et al [20] use agenetic algorithm to perform parameter optimization on an internal combus-tion engine model The model is a bond graph with fixed structure only theparameters of the constitutive equations are varied Genetic programming isconsiderably more flexible since whereas the genetic algorithm uses a fixedlength string genetic programming breeds computer programs (that in turnoutput bond graphs of variable structure when executed) See [30] or [71] forrecent applications of genetic programming and bond graphs to identifyingmodels for engineering systems

Another type of graph-based model used with genetic programming forsystem identification could be called a virtual analog computer StreeterKeane and Koza [76] used genetic programming to discover new designs forcircuits that perform as well or better than patented designs given a per-formance metric but no details about the structure or components of thepatented design Circuit designs could be translated into models in otherdomains (eg through bond graphs or by directly comparing differentialequations from each domain) This is expanded on in book form [46] withan emphasis on the ability of genetic programming to produce parametrizedtopologies (structure identification) from a black-box specification of the be-havioural requirements

Many of the design parameters of a genetic programming implementationcan also be viewed as placing constraints on the model space For example if

29

a particular component is not part of the genetic programming kernel cannotarise through mutation and is not inserted by any member of the functionset then models containing this component are excluded In a strongly typedgenetic programming system the choice of types and their compatibilitiesconstrains the process further Fitness evaluation is another point whereconstraints may be introduced It is easy to introduce a ldquoparsimony pressurerdquoby which excessively large models (or those that violate any other a prioriconstraint) are penalized and discouraged from recurring

18 Contributions of this thesis

This thesis contributes primarily a bond graph modelling and simulation li-brary capable of modelling linear and non-linear (even non-analytic) systemsand producing symbolic equations where possible In certain linear casesthe resulting equations can be quite compact and any algebraic loops areautomaticaly eliminated Genetic programming grammars for bond graphmodelling and for direct symbolic regression of sets of differential equationsare presented The planned integration of bond graphs with genetic program-ming as a system identification technique is incomplete However a functionidentification problem was solved repeatedly to demonstrate and test thegenetic programming system and the direct symbolic regression grammar isshown to have a non-deceptive fitness landscapemdashperturbations of an exactprogram have decreasing fitness with increasing distance from the ideal

30

Chapter 2

Bond graphs

21 Energy based lumped parameter models

Bond graphs are an abstract graphical representation of lumped parameterdynamic models Abstract in that the same few elements are used to modelvery different systems The elements of a bond graph model represent com-mon roles played by the parts of any dynamic system such as providingstoring or dissipating energy Generally each element models a single phys-ical phenomenon occurring in some discrete part of the system To model asystem in this way requires that it is reasonable to ldquolumprdquo the system into amanageable number of discrete parts This is the essence of lumped parame-ter modelling model elements are discrete there are no continua (though acontinuum may be approximated in a finite element fashion) The standardbond graph elements model just one physical phenomenon each such as thestorage or dissipation of energy There are no combined dissipativendashstorageelements for example among the standard bond graph elements althoughone has been proposed [55]

Henry M Paynter developed the bond graph modelling notation in the1950s [57] Paynterrsquos original formulation has since been extended and re-fined A formal definition of the bond graph modelling language [66] hasbeen published and there are several textbooks [19] [40] [67] on the subjectLinear graph theory is a similar graphical energy based lumped parametermodelling framework that has some advantages over bond graphs includingmore convenient representation of multi-body mechanical systems Birkettand Roe [8] [9] [10] [11] explain bond graphs and linear graph theory in

31

terms of a more general framework based on matroids

22 Standard bond graph elements

221 Bonds

In a bond graph model system dynamics are expressed in terms of the powertransfer along ldquobondsrdquo that join interacting sub-models Every bond has twovariables associated with it One is called the ldquointensiverdquo or ldquoeffortrdquo variableThe other is called the ldquoextensiverdquo or ldquoflowrdquo variable The product of thesetwo is the amount of power transferred by the bond Figure 21 shows thegraphical notation for a bond joining two sub-models The effort and flowvariables are named on either side of the bond The direction of the halfarrow indicates a sign convention For positive effort (e gt 0) and positiveflow (f gt 0) a positive quantity of power is transferred from A to B Thepoints at which one model may be joined to another by a bond are calledldquoportsrdquo Some bond graph elements have a limited number of ports all ofwhich must be attached Others have unlimited ports any number of whichmay be attached In figure 21 systems A and B are joined at just one pointeach so they are ldquoone-port elementsrdquo

System A System Beffort e flow f

Figure 21 A bond denotes power continuity between two systems

The effort and flow variables on a bond may use any units so long asthe resulting units of power are compatible across the whole model In factdifferent parts of a bond graph model may use units from another physicaldomain entirely This is makes bond graphs particularly useful for multi-domain modelling such as in mechatronic applications A single bond graphmodel can incorporate units from any row of table 21 or indeed any otherpair of units that are equivalent to physical power It needs to be emphasizedthat bond graph models are based firmly on the physical principal of powercontinuity The power leaving A in figure 21 is exactly equal to the powerentering B and this represents with all the caveats of a necessarily inexactmodel real physical power When the syntax of the bond graph modelling

32

language is used in domains where physical power does not apply (as in eco-nomics) or where lumped-parameters are are a particularly weak assumption(as in many thermal systems) the result is called a ldquopseudo bond graphrdquo

Table 21 Units of effort and flow in several physical domains

Intensive variable Extensive variableGeneralized terms Effort Flow PowerLinear mechanics Force (N) Velocity (ms) (W )

Angular mechanics Torque (Nm) Velocity (rads) (W )Electrics Voltage (V ) Current (A) (W )

Hydraulics Pressure (Nm2) Volume flow (m3s) (W )

Two other generalized variables are worth naming Generalized displace-ment q is the first integral of flow f and generalized momentum p is theintegral of effort e

q =

int T

0

f dt (21)

p =

int T

0

e dt (22)

222 Storage elements

There are two basic storage elements the capacitor and the inertiaA capacitor or C element stores potential energy in the form of gen-

eralized displacement proportional to the incident effort A capacitor inmechanical translation or rotation is a spring in hydraulics it is an accumu-lator (a pocket of compressible fluid within a pressure chamber) in electricalcircuits it is a charge storage device The effort and flow variables on thebond attached to a linear one-port capacitor are related by equations 21and 23 These are called the lsquoconstitutive equationsrdquo of the C element Thegraphical notation for a capacitor is shown in figure 22

q = Ce (23)

33

An inertia or I element stores kinetic energy in the form of general-ized momentum proportional to the incident flow An inertia in mechanicaltranslation or rotation is a mass or flywheel in hydraulics it is a mass-floweffect in electrical circuits it is a coil or other magnetic field storage deviceThe constitutive equations of a linear one-port inertia are 22 and 24 Thegraphical notation for an inertia is shown in figure 23

p = If (24)

Ceffort e flow f

Figure 22 A capacitive storage element the 1-port capacitor C

Ieffort e flow f

Figure 23 An inductive storage element the 1-port inertia I

223 Source elements

Ideal sources mandate a particular schedule for either the effort or flow vari-able of the attached bond and accept any value for the other variable Idealsources will supply unlimited energy as time goes on Two things save thisas a realistic model behaviour First models will only be run in finite lengthsimulations and second the environment is assumed to be vastly more capa-cious than the system under test in any respects where they interact

The two types of ideal source element are called the effort source and theflow source The effort source (Se figure 24) mandates effort according toa schedule function E and ignores flow (equation 25) The flow source (Sffigure 25) mandates flow according to F ignores effort (equation 26)

e = E(t) (25)

f = F (t) (26)

34

Se effort e flow f

Figure 24 An ideal source element the 1-port effort source Se

Sf effort e flow f

Figure 25 An ideal source element the 1-port flow source Sf

Within a bond graph model ideal source elements represent boundaryconditions and rigid constraints imposed on a system Examples includea constant valued effort source representing a uniform force field such asgravity near the earthrsquos surface and a zero valued flow source representinga grounded or otherwise absolutely immobile point Load dependence andother non-ideal behaviours of real physical power sources can often be mod-elled by bonding an ideal source to some other standard bond graph elementsso that they together expose a single port to the rest of the model showingthe desired behaviour in simulation

224 Sink elements

There is one type of element that dissipates power from a model The resistoror R element is able to sink an infinite amount of energy as time goes onThis energy is assumed to be dissipated into a capacious environment outsidethe system under test The constitutive equation of a linear one-port resistoris 27 where R is a constant value called the ldquoresistancerdquo associated with aparticular resistor The graphical notation for a resistor is shown in figure26

e = Rf (27)

225 Junctions

There are four basic multi-port elements that do not source store or sinkpower but are part of the network topology that joins other elements to-

35

Reffort e flow f

Figure 26 A dissipative element the 1-port resistor R

gether They are called the transformer the gyrator the 0-junction andthe 1-junction Power is conserved across each of these elements The trans-former and gyrator are two-port elements The 0- and 1-junctions permitany number of bonds to be attached (but are redundant in any model wherethey have less than 3 bonds attached)

The graphical notation for a transformer or TF-element is shown in fig-ure 27 The constitutive equations for a transformer relating e1 f1 to e2 f2

are given in equations 28 and 29 where m is a constant value associatedwith the transformer called the ldquotransformer modulusrdquo From these equa-tions it can be easily seen that this ideal linear element conserves power since(incoming power) e1f1 = me2f2m = e2f2 (outgoing power) In mechanicalterms a TF-element may represent an ideal gear train (with no frictionbacklash or windup) or lever In electrical terms it is a common electricaltransformer (pair of windings around a single core) In hydraulic terms itcould represent a coupled pair of rams with differing plunger surface areas

e1 = me2 (28)

f1 = f2m (29)

The graphical notation for a gyrator or GY-element is shown in figure 28The constitutive equations for a gyrator relating e1 f1 to e2 f2 are givenin equations 210 and 211 where m is a constant value associated with thegyrator called the ldquogyrator modulusrdquo From these equations it can be easilyseen that this ideal linear element conserves power since (incoming power)e1f1 = mf2e2m = f2e2 (outgoing power) Physical interpretations of thegyrator include gyroscopic effects on fast rotating bodies in mechanics andHall effects in electro-magnetics These are less obvious than the TF-elementinterpretations but the gyrator is more fundamental in one sense two GY-elements bonded together are equivalent to a TF-element whereas two TF-elements bonded together are only equivalent to another transformer Modelsfrom a reduced bond graph language having no TF-elements are called ldquogyro-bond graphsrdquo [65]

e1 = mf2 (210)

36

f1 = e2m (211)

TFeffort e1 flow f1 effort e2 flow f2

Figure 27 A power-conserving 2-port element the transformer TF

GYeffort e1 flow f1 effort e2 flow f2

Figure 28 A power-conserving 2-port element the gyrator GY

The 0- and 1-junctions permit any number of bonds to be attached di-rected power-in (bin1 binn) and any number directed power-out (bout1 boutm)These elements ldquojoinrdquo either the effort or the flow variable of all attachedbonds setting them equal to each other To conserve power across the junc-tion the other bond variable must sum to zero over all attached bonds withbonds directed power-in making a positive contribution to the sum and thosedirected power-out making a negative contribution These constitutive equa-tions are given in equations 212 and 213 for a 0-junction and in equations214 and 215 for a 1-junction The graphical notation for a 0-junction isshown in figure 29 and for a 1-junction in figure 210

e1 = e2 middot middot middot = en (212)sumfin minus

sumfout = 0 (213)

f1 = f2 middot middot middot = fn (214)sumein minus

sumeout = 0 (215)

The constraints imposed by 0- and 1-junctions on any bond graph modelthey appear in are analogous to Kirchhoffrsquos voltage and current laws forelectric circuits The 0-junction sums flow to zero across the attached bondsjust like Kirchhoffrsquos current law requires that the net electric current (a flowvariable) entering a node is zero The 1-junction sums effort to zero across the

37

effort e1 flow f1

0

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 29 A power-conserving multi-port element the 0-junction

effort e1 flow f1

1

effort e2 flow f2

effort e3 flow f3

effort e4 flow f4

effort e5 flow f5

effort e6 flow f6

effort e7 flow f7

effort e8 flow f8

Figure 210 A power-conserving multi-port element the 1-junction

38

attached bonds just like Kirchhoffrsquos voltage law requires that the net changein electric potential (an effort variable) around any loop is zero Naturallyany electric circuit can be modelled with bond graphs

The 0- and 1-junction elements are sometimes also called f-branch ande-branch elements respectively since they represent a branching points forflow and effort within the model [19]

23 Augmented bond graphs

Bond graph models can be augmented with an additional notation on eachbond to denote ldquocomputational causalityrdquo The constitutive equations of thebasic bond graph elements in section 22 could easily be rewritten to solvefor the effort or flow variable on any of the attached bonds in terms of theremaining bond variables There is no physical reason why they should bewritten one way or another However in the interest of solving the resultingset of equations it is helpful to choose one variable as the ldquooutputrdquo of eachelement and rearrange to solve for it in terms of the others (the ldquoinputsrdquo)The notation for this choice is a perpendicular stroke at one end of the bondas shown in figure 211 By convention the effort variable on that bond isconsidered an input to the element closest to the causal stroke The locationof the causal stroke is independent of the power direction half arrow In bothparts of figure 211 the direction of positive power transfer is from A to BIn the first part effort is an input to system A (output from system B) andflow is an input to system B (output from system A) The equations for thissystem would take the form

flow = A(effort)

effort = B(flow)

In the second part of figure 211 the direction of computational causality isreversed So although the power sign convention and constitutive equationsof each element are unchanged the global equations for the entire model arerewritten in the form

flow = B(effort)

effort = A(flow)

The concept discussed here is called computational causality to emphasizethe fact that it is an aid to extracting equations from the model in a form that

39

System A System Beffort e flow f

System A System Beffort e flow f

Figure 211 The causal stroke and power direction half-arrow on a fullyaugmented bond are independent

is easily solved in a computer simulation It is a practical matter of choosing aconvention and does not imply any philosophical commitment to an essentialordering of events in the physical world Some recent modelling and simu-lation tools that are based on networks of ported elements but not exactlybond graphs do not employ any concept of causality [2] [56] These systemsenjoy a greater flexibility in composing subsystem models into a whole asthey are not burdened by what some argue is an artificial ordering of events1 Computational causality they argue should be determined automaticallyby bond graph modelling software and the softwarersquos user not bothered withit [16] The cost of this modelling flexibility is a greater complexity in formu-lating a set of equations from the model and solving those in simulation (inparticular very general symbolic manipulations may be required resulting ina set of implicit algebraic-differential equations to be solved rather than a setof explicit differential equations in state-space form see section 251) Thefollowing sections show how a few simple causality conventions permit theefficient formulation of state space equations where possible and the earlyidentification of cases where it is not possible

1Consider this quote from Franois E Cellier at httpwwwecearizonaedusimcellierpsmlhtml ldquoPhysics however is essentially acausal (Cellier et al 1995) Itis one of the most flagrant myths of engineering that algebraic loops and structural sin-gularities in models result from neglected fast dynamics This myth has its roots inthe engineersrsquo infatuation with state-space models Since state-space models are whatwe know the physical realities are expected to accommodate our needs Unfortunatelyphysics doesnrsquot comply There is no physical experiment in the world that could distin-guish whether I drove my car into a tree or whether it was the tree that drove itself intomy carrdquo

40

231 Integral and derivative causality

Tables 22 and 23 list all computational causality options and the resultingequations for each of the standard bond graph elements

Source elements since they mandate a particular schedule for one vari-able have only one causality option that in which the scheduled variable isan output of the source element The sink element (R) and all power conserv-ing elements (TF GY 0 1) have two causality options Either option willdo for these elements since there is no mandate (as for source elements) andsince the constitutive equations are purely algebraic and (for linear elementsat least) easily inverted

The storage elements (C I) have two causality options as well In thiscase however there are reasons to prefer one form over the other The prefer-ence is always for the form that yields differential expressions on the left-handside and not the right-hand side of any equations Table 22 shows that thisis causal effort-out for capacitors and causal effort-in for inductors Since thevariables on the right-hand side of these preferred differential equations areinputs to the element (flow for a capacitor and effort for an inductor) thevariable in the differential expression can be evaluated by integration overtime For this reason these preferred forms are called ldquointegral causalityrdquoforms In physical terms the opposing forms called ldquoderivative causalityrdquooccur when the energy variable of a storage element depends directly thatis without another type of storage element or a resistor between them onanother storage element (storage-storage dependency) or on a source element(storage-source dependency) of the model [51] Two examples of derivativecausality follow

The electric circuit shown schematically in figure 212 contains two ca-pacitors connected in parallel to some larger circuit A corresponding bondgraph model fragment is shown next to it If one capacitor is first assignedthe preferred effort-out causality that bond is the effort-in to the junctionSince a 0-junction permits only one attached bond to be causal effort-inthe rest must be effort-out forcing the bond attached to the second capac-itor to be effort-in with respect to the capacitor In electrical terms thecharges stored in these capacitors are not independent If the charge on oneis known then the voltage across it is known (e = qC) Since they are con-nected directly in parallel with on another it is the same voltage across bothcapacitors Therefore the charge on the second capacitor is known as well(q = eC) They act in effect as a single capacitor (with summed capacitance)

41

Table 22 Causality options by element type 1- and 2-port elements

Element Notation Equations

Effort source Se e = E(t)

Flow source Sf f = F (t)

CapacitorC

q = Ce

f = dqdt

Ce = qC

dqdt

= f

InertiaI

f = pIdpdt

= e

Ip = If

e = dpdt

ResistorR f = eR

R e = Rf

TransformerTF1 2 e1 = me2

f2 = mf1

TF1 2 f1 = mf2

e2 = me1

GyratorGY1 2 e1 = rf2

e2 = rf1

GY1 2 f1 = e2rf2 = e1r

42

Table 23 Causality options by element type junction elements

Element Notation Equations

0-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

ein2 = middot middot middot = einn = elowast

eout1 = middot middot middot = eoutm = elowast

f lowast =summ

1 fout minussumn

2 fin

(blowast directed power-out)

elowast equiv eout1

f lowast equiv fout1

ein1 = middot middot middot = einn = elowast

eout2 = middot middot middot = eoutm = elowast

f lowast =sumn

1 fin minussumm

2 fout

1-Junction

(blowast directed power-in)

elowast equiv ein1

f lowast equiv fin1

fin2 = middot middot middot = finn = f lowast

fout1 = middot middot middot = foutm = f lowast

elowast =summ

1 eout minussumn

2 ein

(blowast directed power-ou)

elowast equiv eout1

f lowast equiv fout1

fin1 = middot middot middot = finn = f lowast

fout2 = middot middot middot = foutm = f lowast

elowast =sumn

1 ein minussumm

2 eout

bull 0- and 1-junctions permit any number of bonds directed power-in(bin1 binn) and any number directed power-out (bout1 boutm)

bull A 0-junction permits only one connected bond (blowast) to be causal effort-in (with the causal stroke adjacent to the junction) The rest must beeffort-out

bull A 1-junction permits only one connected bond (blowast) to be causal effort-out (with the causal stroke away from the junction) The rest must beeffort-in

43

and have only one degree of freedom between themFor a mechanical example consider two masses rigidly joined such as

the point masses on a rigid lever shown in figure 213 The bond graphmodel shown in that figure is applicable for small rotations of the lever abouthorizontal The dependency between storage elements is not as direct hereas in the previous example The dependent elements (two I-elements) arelinked through three other elements not just one Nevertheless choosingintegral causality for one inertia forces derivative causality on the other as aconsequence of the causality options for the three intervening elements Hereagain there are two storage elements in the model but only one degree offreedom

Figure 212 A system having dependent dynamic parts and a correspondingmodel having storage elements with derivative causality electrical capacitorsin parallel and C elements with common effort

232 Sequential causality assignment procedure

There is a systematic method of assigning computational causality to bondgraph models so as to minimize computational complexity and consistentlyarrive at a state space formulation where possible This method is called thesequential causality assignment procedure (SCAP) [40] [67] and it proceedsin three stages After each stage the consequences of currently assignedcausality are propagated as far as possible through the graph For exampleif a bond has been assigned effort-in causality to a 0-junction then the otherbonds attached to that junction must all be causal effort-out from that junc-

44

Figure 213 A system having dependent dynamic parts and a correspond-ing model having storage elements with derivative causality rigidly joinedmasses and I elements with proportionally common flow

tion and any of those bonds that do not yet have a causal assignment can beassigned at that time The three stages of SCAP are

1 Assign the required causality to all bonds attached to sources

2 Assign the preferred causality to any unassigned bonds attached tostorage elements

3 Assign arbitrary (perhaps random) causality to any unassigned bondsattached to sink elements

If unassigned bonds attached to nodes of any one type are always handledin a consistent order then the assigned causalities and therefore also thecollected equations will be stable (identical for identical graphs)

Consider the fully augmented Se-C-I-R model in figure 52 In the firststage of SCAP bond 4 would be assigned causality effort-out from the source(effort-in to the junction) Since 1-junctions allow any number of attachedbonds to be causal effort-in there are no further consequences of this as-signment In the second stage storage elements are assigned their preferredcausality If the I-element is chosen first and the attached bond is assignedthe preferred causality (effort-in to the inertia effort-out from the junction)then the remaining bonds attached to the junction must be causal effort-inand the graph is completely augmented If instead the C-element is chosen

45

before the I-element it is still assigned the preferred causality Consequencesdo not propagate any further and the I-element is selected after all In eithercase as is often the case all bonds attached to sink elements are assigned bypropagation of consequences and stage 3 is never reached

24 Additional bond graph elements

241 Activated bonds and signal blocks

An activated bond is one on which either the effort or flow variable is con-sidered negligibly small and the other is called the signal Activated bondscarry information in the signal but no power no matter how large a valuethe signal may take on An activated bond is drawn with a full arrow at oneindicating the signal-flow direction

Any bond graph can be transformed into a signal-flow graph by splittingeach bond into a pair of activated bonds carrying the effort and flow signalsThe reverse is not true since bond graphs encode physical structure in waysthat a signal graph is not required to conform to Arbitrary signal-flow graphsmay contain odd numbers of signals or topologies in which the signals cannot be easily or meaningfully paired

Signal blocks are elements that do not source store or sink power but in-stead perform operations on signals and can only be joined to other elementsby activated bonds In this way activated bonds form a bridge between bondgraphs and signal-flow graphs allowing the techniques to be used together ina single model

242 Modulated elements

Activated bonds join standard bond graph elements at a junction whereconvention or the signal name denotes which of the effort or flow variable iscarried or at modulated elements If the modulus of a transformer (TF) orgyrator (GY) is not constant but provided instead by the signal on an acti-vated bond a modulated transformer (MTF) or modulated gyrator (MGY)results

46

MTF

effort e1 flow f1

effort e2 flow f2modulus m

Figure 214 A power-conserving 2-port the modulated transformer MTF

MGY

effort e1 flow f1

effort e2 flow f2modulus m

Figure 215 A power-conserving 2-port the modulated gyrator MGY

243 Complex elements

Section 242 showed how certain non-linear effects can be modelled usingmodulated transformer and gyrator elements [40] presents a multi-port MTFnode with more than two power bonds Single- and multi-port elementswith complex behaviour such as discontinuous or even non-analytic effortndashflow relations are also possible In the non-analytic case element behaviourwould be determined empirically and implemented as a lookup table withinterpolation for simulation These effortndashflow relations are typically notinvertible and so these elements like the ideal source elements introducedin section 22 have a required causality

Figure 216 shows an example of an element with discontinuous consti-tutive equations The element is labelled CC for contact compliance themechanical system from which this name derives (perpendicular impact be-tween an elastic body and a rigid surface) is shown in figure 217 Constitutiveequations for the CC element in the shown causal assignment are given inequations 216 and 217 The notation (q lt r) is used to indicate 0 whenq gt r and 1 otherwise where r is called the threshold and is a new parame-ter particular to this specialized variant of the C element In the mechanical

47

interpretation of figure 217 these equations are such that the force exertedby the spring is proportional to the compression of the spring when it is incontact with the surface and zero otherwise The effortndashdisplacement rela-tionship of the CC element is plotted in figure 218 Behaviour of the 1-portCC element with effort-in causality is undefined Section 513 includes anapplication of the CC element

CCeffort e flow f

Figure 216 A non-linear capacitive storage element the 1-port contactcompliance CC

Figure 217 Mechanical contact compliance an unfixed spring exerts noforce when q exceeds the unsprung length r

e = (q lt r)(r minus q)C (216)

dq

dt= f (217)

244 Compound elements

The above example shows how a new bond graph element with arbitraryproperties can be introduced It is also common to introduce new compoundelements that stand in place of a particular combination of other bond graphelements In this way complex models representing complex systems canbe built in a modular fashion by nesting combinations of simpler modelsrepresenting simpler subsystems At simulation time a preprocessor mayreplace compound elements by their equivalent set of standard elements or

48

minusrC

0

e

0 r

q

C

1

Figure 218 Effort-displacement plot for the CC element shown in figure216 and defined by equations 216 and 217

compound elements may be implemented such that they directly output a setof constitutive equations that is entirely equivalent to those of the standardelements being abstracted

A library of three-dimensional joints for use in modelling multibody me-chanics is given in [86] Other libraries of reusable sub-models are presentedin [19] and [4]

For a modelling application in mechanics consider the planar mechanismshown in figure 219 A slender mass is pin jointed at one end The pinjoint itself is suspended in the horizontal and vertical directions by a spring-damper combination A bond graph model of this system is given in figure512 The graph is reproduced as figure 220 and divided into two partsrepresenting the swinging mass (ldquolinkrdquo) and the suspended pin joint (ldquojointrdquo)The sub-models are joined at only two ports bonds 20 and 23 representingthe horizontal and vertical coupling of the pin joint to point A on the linkIf two new compound bond graph elements were introduced subsuming thesub-graphs marked in figure 220 the entire model could be draw as in figure221

49

Figure 219 A pendulum with damped-elastic suspension

link_A_B joint_at_A

Im

1

1

Im

1

2

Semg

3

1

IJ

4

R05

5

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB=0

13

0

14

MTFmyB

19

SeFyB=0

15

1617

1

C001

21

R08

22

0

20

1

C001

24

R08

25

0

23

Figure 220 Sub-models of the simple pendulum depicted in figure 219 andmodelled in figure 512

50

link_A_B

joint_at_A

20 23

Figure 221 Model of a simple pendulum as in figures 512 and 220 rewrittenusing specialized compound elements

25 Simulation

251 State space models

The instantaneous state of an energy-based model is precisely described bya full accounting of the distribution of energy within the system For a bondgraph built from the standard elements this is simply the value of the energyvariable of each storage element in the model If these values are assembledinto a vector that vector locates the model in a ldquostate spacerdquo

The ldquostate space equationsrdquo express the first derivative of the state vectorin terms of state and input variables only (called the ldquostate updaterdquo equation)and express any non-state non-input variables in terms of the state variablesand the input variables (the ldquoreadoutrdquo equation) For a system with inputs~x state vector ~u and outputs ~y the ldquostate updaterdquo equation is 218 and theldquoreadoutrdquo equation is 219

d~u

dt= f(~x ~u) (218)

~y = g(~x ~u) (219)

The procedure to form the state space equations of a bond graph modelis straight forward provided the constitutive equations for each element comefrom table 22 or 23 and all storage elements have integral causality Notethat every equation in tables 22 and 23 is written in an assignment state-ment form having just one variable (or the first time-derivative of a variable)alone on the left-hand side If following the procedure described in Section232 causality has been assigned to all bonds and the result shows integral

51

causality for all storage elements then every variable occurring in the con-stitutive equations of the elements will appear on the left-hand side of oneand only one of the collected equations The state update equation is formedfrom those constitutive equations having a derivative on the left-hand sideThe energy variable of any storage element with integral causality will be-come a state variable (one element in the state vector) The procedure isto replace each variable on the right-hand side of these equations with theirldquodefinitionrdquo (the right-hand side of the equation for which the variable ap-pears on the left-hand side) repeatedly back-substituting until only input andstate variables appear on the right-hand side Applying the same procedureto the remaining (non-state update) equations yields the readout equationsfor every non-state variable

252 Mixed causality models

When a model contains one or more storage elements with derivative causal-ity it is not possible to reduce the constitutive equations of that model tostate space form The result instead is a set of implicit differential-algebraicequations in which either some differential expressions must occur on theright-hand side of some equations or some variables must occur only on theright-hand sides and never on the left-hand side of any equations This is in-convenient since there are ready and well known techniques for solving statespace equation sets in simulation

An experimenter faced with such a model has two choices either reach fora more powerful (and inevitably more computationally expensive) solutionmethod for the simulation or rewrite the model to eliminate any occurrenceof derivative causality In the first case the numerical solution techniquesare more expensive than those for state space equations since a set of implicitalgebraic equations must be solved iteratively (eg by a root finding algo-rithm) at each time step In the second case the approach varies with thesystem the purpose of the model and the preference of the experimenter

One approach to revising a mixed or differential causality model is tocombine dependent elements In the case of the parallel electrical capacitorsmodel shown in figure 212 it is a simple matter of replacing the two C-elements with a single one having the sum of their capacitance For the linkedmasses in figure 213 or models having even longer and more complicatedlinkages possibly spanning sub-models of different physical domains thechoice of which elements to combine and how is not at all obvious

52

A second approach is to decouple the dependent storage elements by in-troducing additional decoupling elements between them raising the numberof degrees of freedom of the model For example if a small resistance wereadded between the capacitors in figure 212 then they would no longer bedirectly linked The charge stored in each would be independent and themodel would have as many degrees of freedom as storage elements In therigidly coupled masses example (figure 213) the addition of an R-elementwould in effect make the lever plastic A C-element could make the leverelastic Either way the inertias would be freed to move independently and astate-space formulation would then be possible

53

Chapter 3

Genetic programming

31 Models programs and machine learning

Genetic programming is the application of genetic algorithms to the task ofcreating computer programs It is a ldquobiologically inspired domain-independentmethod that automatically creates a computer program from a high-levelstatement of a problemrsquos requirementsrdquo [45]

A method that automatically creates computer programs will be of inter-est to anyone doing system identification since a computer program and amodel (model in the sense of chapter 1) are very similar at their most abstractand formal description A computer program is something that receives in-formational input and performs computations on that input to produce aninformational output A system model receives input signals and performstransformations on those signals to produce output signals

Figure 31 A computer program is like a system model

The method popularized by Koza in [43] [44] [45] [46] and used here isa genetic algorithm applied to computer programs stored as concrete syntaxtrees in a functional programming language Genetic algorithms are one

54

form of ldquoevolutionary computingrdquo which includes also evolution strategiesand others All of evolutionary computing is in turn part of the broaderclass of adaptive heuristic search algorithms inspired by natural processeswhich includes also simulated annealing swarm optimization ant colonyoptimization and others

The canonical genetic algorithm is similar to some other common ma-chine learning algorithms such as back propagation learning in multilayerperceptron networks in that it performs a local search for minima on an er-ror surface However whereas back-propagation learning in neural networkstests one point on the error surface at a time a genetic algorithm exploresmany points on the surface in parallel The objective of a genetic algorithmneed not have any particular structure such as a network or a tree so longas the candidate solutions can be encoded in a form that is amenable to theoperations described in the next section

32 History of genetic programming

Early experiments in evolving computer programs were performed by FogelOwens and Walsh under the title Evolutionary Programming [26] In thatwork populations of finite state automata were evolved to solve discretetime series prediction problems The reproduction and mutation geneticoperators (asexual reproduction) were used to search for individual solutionsinhabiting minima on an error surface (fitness maxima) Crossover (sexualreproduction) was not used and a problem-specific encoding of solutions asfinite state automata limited the generality of the method

In 1981 Richard Forsyth published a pattern recognition system usinggenetic principals to breed tree structured Boolean proposition statementsas classifiers [27] In this work Forsyth notes the generality of ldquoNaturalisticSelectionrdquo in contrast to the many limiting assumptions inherent in more spe-cialized statistical classifiers and also makes the important observation thatthe evolved rules are plainly and comprehensibly expressed which facilitateshuman-machine collaboration

Using a tree structured encoding of solutions allows the size and shapeof individuals to vary so that these need not be known a priori 1 [18] also

1In an interesting early link to other soft computing techniques under ldquofuture workrdquoForsyth suggests using fuzzy logic in place of crisp Boolean propositions and referencesLA Zadeh ldquoFuzzy Setsrdquo in Information and Control 8 1965

55

used genetic operations to search for tree structured programs in a specialpurpose language Here the application was in symbolic regression

A tree structured encoding of individuals would later prove to be veryimportant Generating machine code or FORTRAN without regard to syntaxfails to yield acceptable results ([22]) because the distribution of syntacticallycorrect and therefore executable programs within the search space of allpossible combinations of say FORTRAN key words is far too sparse Ratherit is preferable to limit the search to finding a program which solves theproblem from within the (very much smaller) space of all syntactically correctprograms

Program syntax may be checked by generating the parse tree as would acompiler Genetic operations may then be chosen which operate on the pro-grams parse or syntax tree directly and in syntax preserving ways For fitnesstesting the parse tree is converted into executable code In 1992 John Kozapublished a textbook ldquoGenetic Programming on the programming of com-puters by means of natural selectionrdquo [43] which very much set the standardfor the future work in Genetic Programming In it he shows an implementa-tion of a program induction system in LISP a general purpose programminglanguage using the reproduction crossover and mutation operators on indi-vidual program syntax trees LISP was a particularly convenient choice fortwo reasons First the syntax or parse tree of LISP programs is very directlyexpressed in LISP source code An example of this dual representation isshown in figure 6

Second LISP is a highly dynamic language with introspective capabil-ities That is a program written in LISP may read and modify its ownsource code while it is running This allowed John Koza to write a GeneticProgramming framework in LISP which operated on other LISP programs(the candidate solutions) and could load and run them as needed for fitnesstesting For these reasons most genetic programming implementations sincehave been written in LISP although C is also used [31] [63] and many ex-perimenter still use problem-specific languages Improved performance hasbeen shown by using genetic programming to evolve low level yet generalpurpose machine code for a stack-based virtual machine [58] For demon-strating human competitive artificial intelligence it will be desirable to use ageneral purpose high level programming language popular among human pro-grammers ldquoGrammatical evolutionrdquo is a genetic algorithm that can evolvecomplete programs in any recursive language using a variable-length linearchromosome to encode mappings from a Backus Naur Form (BNF) language

56

definition to individual programs [68] This technique has been demonstratedusing a BNF language definition consisting of a subset of C where the re-sulting programs were compiled with a standard C compiler and executed totest their fitness [54]

Today there are several journals and annual conferences dedicated togenetic programming and algorithms in general

33 Genetic operators

In an analogy to natural evolution genetic algorithms maintain a populationof candidate solutions to the problem that are selectively reproduced andrecombined from one generation to the next The initial generation is createdby arbitrary and usually random means There are three operations bywhich the next generation individual candidate solutions are created fromthe current population

1 Sexual reproduction (ldquocrossoverrdquo)

2 Asexual reproduction (ldquoreplicationrdquo)

3 Mutated asexual reproduction (ldquomutationrdquo)

The words ldquosexualrdquo and ldquoasexualrdquo do no mean that there is any analogyto gender in the genetic algorithm They simply refer to operations in whichtwo ldquoparentrdquo candidate solutions come together to create a ldquochildrdquo candidateand to operations in which only one ldquoparentrdquo candidate is used

These operations are applied to an encoded version of the candidate solu-tions Candidates are maintained in this form from generation to generationand only converted back to their natural expression in the problem domainwhen they are evaluated once per generation The purpose of this encodingis so that the operations listed above need not be redesigned to apply to theform of the solution to each new type of problem Instead they are designedto apply to the encoded form and an encoding is designed for each new prob-lem type The usual encoded form is a bit string a fixed-length sequence ofzeros and ones The essential property that a bit string encoding must havefor the genetic operators to apply is that every possible combination of zerosand ones represents a syntactically valid if hopelessly unfit solution to theproblem

57

331 The crossover operator

The crossover operator is inspired by sexual reproduction in nature Specif-ically genetic recombination in which parts of the genetic material of bothparents are combined to form the genetic material of their offspring Forfixed-length bit string genotypes ldquosingle pointrdquo crossover is used In thisoperation a position along either of the equal-length parents is chosen ran-domly Both parent bit strings are split at this point (hence single point)and the latter parts are swapped Two new bit strings are created that arecomplimentary offspring of the parents The operation is depicted in figure331

Parents aaaaaa bbbbbbaaaa aa bbbb bb

Children aaaabb bbbbaa

Figure 32 The crossover operation for bit string genotypes

332 The mutation operator

The mutation operator is inspired by random alterations in genetic materialthat give rise to new previously unseen traits in nature The mutationoperator applied to bit string genotypes replaces one randomly selected bitwith its compliment The operation is depicted in figure 33

Parent aaaaaaaaaa a a

Child aaaaba

Figure 33 The mutation operation for bit string genotypes

333 The replication operator

In the replication operation a bit string is copied verbatim into the nextgeneration

58

Parent aaaaaaChild aaaaaa

Figure 34 The replication operation for bit string genotypes

334 Fitness proportional selection

The above three operators (crossover mutation replication) apply to directlygenetic material and describe how new genetic material is created from oldAlso needed is some way of selecting which old genes to operate on In ananalogy to ldquosurvival of the fittestrdquo in nature individual operands are chosenat random with a bias towards selecting the more fit individuals Selectionin exact proportion to the individual fitness scores is often visualized as aroulette wheel on which each individual has been allocated a wedge whoseangular extent is proportional to the individualrsquos portion of the sum of allfitness scores in the population A uniform random sample from the circum-ference of the wheel equivalent to dropping a ball onto this biased wheelwill have a chance of selecting each individual that is exactly proportional tothat individualrsquos fitness score

This operation (fitness proportional selection) is applied before each oc-currence of a genetic operator to choose the operands In this way it tends tobe the more fit individuals which are used in crossover mutation and repli-cation to create new individuals No individual in any generation is entirelyruled out of being selected occasionally However less fit individuals are lesslikely to be selected and sequences within the bit string genome which arepresent in less fit individuals but not in any particularly fit individuals willtend to be less prevalent in subsequent generations

34 Generational genetic algorithms

Genetic algorithms can be divided into two groups according to the order inwhich new individuals are added to the population and others removed Ina ldquosteady staterdquo genetic algorithm new individuals are created one or twoat a time by applying one of the genetic operators to a selection from theexisting population Thus the population evolves incrementally without aclear demarcation of one generation from the next In a ldquogenerationalrdquo ge-netic algorithm the entire population is replaced at once A new population

59

of the same size as the old is built by selecting individuals from the old andapplying genetic operators to them Then the old population is discardedThe more popular generational form is described below and used throughoutA comparison of the performance of steady state and generational algorithmsis given in [64]

The generational form of a genetic algorithm proceeds as follows [52]

1 Choose a representation for solutions to the problem as bit strings ofsome fixed length N

2 Choose the relative probabilities of crossover PC mutation PM andreplication PR = 1minus (PC + PM)

3 Choose the population size M

4 Define termination criteria in terms of best fitness number of genera-tions or a length of time

5 Generate an initial random population of M individuals

6 Evaluate the fitness of each individual in the population

7 If any of the termination criteria are met halt

8 Make a weighted random selection of a genetic operator

9 Select one or two (dependent on the operator) individuals at randomweighted in proportion to their fitness

10 Apply the operator to the selected individuals creating a new individ-ual

11 Repeat steps 8ndash10 until M new individuals have been created

12 Replace the old population with the newly created individuals

13 Go to step 6

This is the form originally presented by Holland [34] and is also called thecanonical genetic algorithm [84] See also figure 34 for a simplified flowchart

It is customary for the members of the initial population to be randomlygenerated but they may just as well be drawn from any configuration that

60

needs to be adapted to requirements encoded in a fitness test Genetic al-gorithms are adaptive They respond to selective pressures encoded in thefitness test at the start of each generation even if the fitness test has changedbetween generations In this way genetic algorithms can also be used totrack and home in on a moving objective

Genetic algorithms are also open ended searches They may be stoppedat any time to yield their best-of-generation candidate Typically the termi-nation criteria are expressed in terms of both a desired solution fitness level(performance criteria) and either a number of generations or a length of time(resource criteria) The fitness level is set to stop the search as soon as asatisfactory result is achieved The running time or number of generationsrepresent the length of time an experimenter is willing to wait for a solu-tion before restarting the algorithm with different parameters For exampleKoza et al ran their patent-busting genetic programming experiments forone month each [46]

0 Generate initial populationdArr

1 Evaluate fitnessdArr

2 Check termination criteriadArr

3Take fitness-proportional sample

and apply genetic operators

Repeat from step 1

Figure 35 Simplified flowchart for the Genetic Algorithm or Genetic Pro-gramming

35 Building blocks and schemata

The fundamental qualitative argument for why genetic algorithms are suc-cessful is expressed in the so-called ldquobuilding blocksrdquo hypothesis [29] [34]It says that good solutions to a problem (fit individuals) are assembled from

61

good parts In particular good solutions have ldquogoodrdquo partial solutions as-sembled in a ldquogoodrdquo way to form a ldquogoodrdquo whole This seems almost tau-tological at first Of course the whole thing must be ldquogoodrdquo if it is a usefulsolution However it implies a more specific quality of the problem that mustbe true for genetic algorithms to be more successful than a random searchIt implies that good building blocks while not sufficient for a good solutionare necessary For the right type of problem genetic algorithms will keepgood building blocks in the population until they can later be assembled intoa good whole The right type of problem is one in which a good buildingblock can add value to an otherwise flawed solution

Consider a function identification problem It is desired to find an alge-braic expression for an unknown function of one variable Genetic program-ming applied to this class of problems is known as ldquosymbolic regressionrdquoAll the requirements can be met for the application of genetic algorithms tothis problem Namely a suitable encoding and fitness measure are possibleThe encoding may be as a bit string but another more natural encodingis shown in section 453 The fitness function shall be the reciprocal ofthe sum mean square difference between the solution expression and a sam-pling of the ldquotruerdquo function in whatever form it is available There are somesub-expressions or building blocks that will bring the solution expressiondramatically closer to the true function even if other parts of the solutionare incorrect For the example of an offset and scaled parabola such as1 + 2(x + 3)2 it is clear that solutions containing a positive quadratic termwill be more fit than solutions containing only linear expressions given a suf-ficiently large range of samples in the fitness test All linear expressions areexponentially worse fitting near at least one extremity

A more formal complement to the building blocks hypothesis is schematheory [34] [29] Schemata for bit string genomes are strings from the al-phabet 0 1 lowast that describe sets of bit strings A 0 or a 1 anywhere in aschema indicates that bit string members of the corresponding set must havea 0 or a 1 respectively in the given position The lowast character is a wild cardindicating a degree of freedom Members of the schema set may have eithervalue in that position The number of bits that are specified exactly (not bywild cards) in a schema is called the ldquoorderrdquo of the schema The number ofpositions between the outermost exactly specified bits is called the ldquodefininglengthrdquo of the schema For example 01 lowast 1lowast is a length 5 order 3 schemawith a defining length of 4 that describes the following set of bit strings

62

01010

01011

01110

01111

Schema theory states that genetic algorithms truly operate in the spaceof schemata It predicts that low-order schemata with a short defining lengthand above average fitness receive exponentially more trials in future genera-tions Poli [60] gives an adaptation to tree-structured genetic programming

36 Tree structured programs

The key to applying genetic algorithms to the problem of generating a com-puter program is to choose a genetic encoding such that even randomly gen-erated individuals decode to a valid if incorrect computer program Al-ternatively it is possible to use any encoding at all (the one preferred byhuman programmers perhaps) but assign an arbitrarily low (ie zero) fitnessscore to invalid programs However in most languages the distribution ofvalid computer programs among the set of random character strings is soastronomically low that the algorithm will spend virtually all available timeconstructing checking and discarding invalid programs

The canonical genetic programming system by Koza [43] was written inLISP The LISP programming language [50] has an extremely minimal syntaxand a long history of use in artificial intelligence research Program sourcecode and data structures are both represented in a fully bracketed prefixnotation called s-expressions Nesting bracket expressions gives a hierarchicalor tree structure to both code and data As an early step in compilationor interpretation programs written in most languages are fit into a treestructure according to the way they are expressed in the language (calledthe ldquoconcrete parse treerdquo of the program) The LISP s-expression syntax isa very direct representation of this structure (ie parsing LISP programs isnear trivial)

This immediate representation and the ability of LISP code and data to beinterchanged and manipulated freely by other LISP code makes it relatively

63

straight forward to write LISP programs that create and manipulate otherLISP programs Since LISP code and data are both stored as tree structures(s-expressions) any valid data structure is also valid code Data structurescan be created and manipulated according to the genetic operators usingstandard LISP constructs and when the time comes they can be evaluatedas program code to test their fitness

Genetic programming systems are now often written in C [63] [46] forperformance reasons For this thesis Python [83] is used but the essence ofthe process is still the creation manipulation and evaluation of programsin the form of their concrete parse trees Often the genetic programs them-selves are created in a special purpose language that is a reduced versionof or completely distinct from the language in which the genetic program-ming system is written For example in this thesis the genetic programsare created in a statically typed functional language that is interpreted bythe genetic programming system itself written in Python The language isstatically typed in that every element of a program has a particular typethat determines where and how it can be used and the type of an elementcannot be changed It is functional in the same sense as LISP or Haskellwhereas Python in which the genetic programming system is written is animperative language like C or Java

They are ldquoimperativerdquo in the sense that they consist of a sequenceof commands which are executed strictly one after the other []A functional program is a single expression which is executed byevaluating the expression 2

The canonical form of genetic programming as given in [43] shares thebasic procedure of the generational genetic algorithm New generations arecreated from old by fitness proportional selection and the crossover mutationand replication operations Genetic programming is precisely the geneticalgorithm described above with the following important differences

1 Whereas a bit string genetic algorithm encodes candidate solutions toa problem as fixed length bit strings genetic programming individualsare programs (typically encoded as their parse trees) that solve theproblem when executed

2httphaskellorghaskellwikiIntroductionWhat is functionalprogramming3F

64

2 Whereas bit string genetic algorithm chromosomes have no syntax (onlybitwise semantics) the genetic programming operators preserve pro-gram syntax Thus only the reduced space of syntactically correctprograms that is being searched not the space of all possible sequencesof tokens from the programming language

3 Whereas bit string genetic algorithm solutions have a fixed-length en-coding which entails predetermining to some degree the size and shapeof the solution before searching 3 tree structured genetic programmingsolutions have variable size and shape Thus in a system identificationapplication genetic programming can perform structural and paramet-ric optimization simultaneously

4 Whereas bit string genetic algorithm individuals typically decode toa solution directly genetic programming individuals decode to an exe-cutable program Evaluation of genetic programming solutions involvesrunning these programs and testing their output against some metric

5 There are some operators on tree structured programs that have nosimple analogy in nature and are either not available or are not com-monly used in bit string genetic algorithms

Preparation for a run of genetic programming requires the following 5decisions from the experimenter

1 Define the terminal node set This is the set of all variables constantsand zero argument functions in the target language It is the poolfrom which leaf nodes (those having no children) may be selected inthe construction and manipulation of program trees

2 Define the function node set This is the set of operators in the targetlanguage Members are characterized by the number of arguments theytake (their arity) It is the pool from which internal (non-leaf) nodesmay be selected in the construction and manipulation of program treesCanonical genetic programming requires that the function set satisfyclosure meaning that every member of the function set must accept as

3This requirement for a priori knowledge of upper limits on the size and shape of thesolution is common to many other machine learning techniques For example with neuralnetwork applications the number of layers and nodes per layer must be decided by theexperimenter

65

Figure 36 Genetic programming produces a program Some inputs to theframework are generic others are specific to the problem domain but onlyone the fitness test is specific to any particular problem

input the output of any other member This allows crossover points tobe chosen anywhere in the program syntax tree without constraint

3 Define the fitness function Typically the fitness function executes anindividual program with a fixed suite of test inputs and calculates anerror metric based on the program output

4 Choose the evolutionary control parameters M G PR PC PM

M population size (number of individuals)

G maximum number of generations to run

PR probability of selecting the replication operator

PC probability of selecting the crossover operator

PM probability of selecting the mutation operator

5 Choose termination criteria Typically a run of genetic programming isterminated when a sufficiently fit individual is found or a fixed numberof generations have elapsed

The first two decisions (definition of the function and terminal set) definethe genetic encoding These need to be done once per computer language inwhich programs are desired Specialized languages can be defined to better

66

exploit some prior knowledge of the problem domain but in general this isan infrequent task The fitness function is particular to the problem beingsolved It embodies the very definition of what the experimenter considersdesirable in the sought program and so changes with each new problem Theevolutionary control parameters are identical to those of the canonical geneticalgorithm and are completely generic They can be tuned to optimize thesearch and different values may excel for different programming languagesencoding or problems However genetic programming is so very computa-tionally expensive that they are general let alone once adequate performanceis achieved The cost of repeated genetic programming runs usually prohibitsextensive iterative adjustment of these parameters except on simple problemsfor the study of genetic programming itself

Specialized genetic programming techniques such as strongly typed ge-netic programming will require additional decisions by the experimenter

361 The crossover operator for tree structures

The canonical form of genetic programming operates on program parse treesinstead of bit strings and the genetic operations of crossover and mutationdiffer necessarily from those used in the canonical bit string genetic algorithmSelection and replication which operate on whole individuals are identical

The crossover operator discussed in section 331 swaps sub-strings be-tween two parent individuals to create a pair of new offspring individuals Ingenetic programming with tree structured programs sub-trees are swappedTo do this a node is selected at random on each parent then the sub-treesrooted at these nodes are swapped The process is illustrated in figure 37Closure of the function set ensures that both child programs are syntacti-cally correct They may however have different size and shape from theparent programs which differs significantly from the fixed-length bit-stringsdiscussed previously

362 The mutation operator for tree structures

Two variations on the mutation operator are possible using tree structuredindividuals The first called ldquoleaf node mutationrdquo selects at random one leafnode (a node at the periphery of the tree having no children) and replacesit with a new node randomly selected from the terminal set Like the single

67

Figure 37 The crossover operation for tree structured genotypes

bit mutation discussed in section 332 this is the smallest possible randomchange within the overall structure of the individual

The second mutation operator is called ldquosub-tree mutationrdquo In thisvariation a node is selected at random from anywhere within the tree Thesub-tree rooted at this node is discarded and a new sub-tree of random sizeand shape is generated in its place This may create much more dramaticchanges in the program In the extreme if the root node of the whole treeis selected then the entire program is replaced by a new randomly generatedone

363 Other operators on tree structures

Other more exotic operations such as ldquohoistrdquo and ldquoautomatically definedfunctionsrdquo have been suggested [43] Not all have counterparts in bit stringgenetic algorithms or in nature The hoist operator creates a new individualasexually from a randomly selected sub-tree of the parentmdasha partial repli-cation The child will be smaller and differently shaped than the parenttree

The automatically defined function operator adds a new member to thefunction set by encapsulating part of the parent tree An internal (non-leaf)

68

node is selected at random to be the root of the new function The sub-treerooted at this node is traversed to some depth Un-traversed sub-trees belowthis depth are considered arguments to the function A new node is added tothe function set having the required number of arguments (that is requiringthat number of children) This new function stores a copy of the traversedportion Within the parent individual the sub-tree at the root of the newfunction is replaced with an instance of the automatically defined functionwith the untraversed sub-trees attached below In subsequent generations theautomatically defined function is available for use within the descendants ofthis program just like any other member of the function set Immediatelyprior to fitness evaluation of any program containing automatically definedfunctions those functions are replaced by their stored originals Thus anindividual created by the automatically defined function operator is exactlyas fit as its parent but is has a different and simpler structure using anew function that was not available to the parent Automatically definedfunctions allow a portion of a program from anywhere within its tree to beencapsulated as a unit unto itself and thus protected from the destructiveeffects of crossover and mutation

37 Symbolic regression

Symbolic regression is an application of genetic programming where the ldquopro-grammingrdquo language in use is restricted to just algebraic expressions Sym-bolic regression solves the problem of generating a symbolic expression that isa good fit to some sampled data The task performed by symbolic regressionis sometimes also known as ldquofunction identificationrdquo or ldquoequation discoveryrdquoA variation that generates dimensions (units) along with the expressions iscalled ldquoempirical discoveryrdquo [41] Traditional regression methods such as lin-ear quadratic or polynomial regression search only over for the coefficientsof a preformed expression They cannot automatically discover the form ofexpression which best suits the data

Symbolic regression and its variants are widely reported in the literatureIn its simplest form it is easily understood and example problems are easilygenerated Symbolic regression is often used in benchmarking implementa-tions of the genetic programming technique Symbolic regression on a single-input single-output (SISO) static nonlinear function is presented in section52 Symbolic regression on multiple-input multiple-output (MIMO) static

69

functions can be treated in the same way using vector quantities in place ofscalars Symbolic regression on SISO or MIMO dynamic systems howeverrequires a different approach Modelling of dynamic systems requires somesort of memory operation or differential equations Section 453 presentsa design for using symbolic regression to discover sets of differential andalgebraic equations in state space form

38 Closure or strong typing

Genetic programming systems must be strongly typed or mono-typed withclosure The type system of a functional programming language is said tohave closure if the output of every function is suitable input for any functionin the language The canonical form presented by Koza is mono-typed withclosure To enforce closure in symbolic regression care must be taken withsome very common operators that one is almost certain to want to includein the function set

For example division must be protected from division by zero Kozahandles this by defining a ldquoprotected divisionrdquo operator ldquordquo which returnsthe constant value 00 whenever its second operand is zero

Measures such as this ensure that every candidate program will executewithout error and produce a result within the domain of interest The out-comes then are always valid programs that produce interpretable arithmeticexpressions They may however exploit the exceptions made to ensure clo-sure in surprising ways A candidate program might exploit the fact thatx0 = 0 for any x in a particular genetic programming system to producethe constant 10 through a sub-expression of the form cos(x0) where x is asub-expression tree of any size and content In this case the result is inter-pretable by replacing the division node with a 00 constant value node Amore insidious example arises if one were to take the seemingly more rea-sonable approach of evaluating x0 as equal to some large constant (perhapswith the same sign as x) That is x0 has a value as close to infinity asis representable within the programming system This has the disadvantagethat sin(x0) and cos(x0) evaluate to unpredictable values on [-11] whichvary between computer platforms based on the largest representable valueand the implementation of the sin and cos functions

A more robust solution is to assign a very low fitness value to any can-didate program which employs division by zero Using the standard Python

70

Listing 31 Protecting the fitness test code against division by zero

def f i tT e s t ( candidate ) e r r o r = 0 0try

for x in t e s tPo i n t s e r r o r += abs ( candidate ( x ) minus t a r g e t ( x ) )

except ZeroDiv i s i onErro r return 0 0

NaN ( undef ined ) compares equa l to any number i f e r r o r==0 and e r r o r==1

return 0 0 High error minusminusgt low f i t n e s s Zero error minusminusgt p e r f e c t f i t n e s s ( 1 0 )return 1 0(1 0+ e r r o r )

division operator will throw a runtime exception when the second operandis zero That exception is trapped and the individual in which it occurred isawarded a fitness evaluation of zero The following source code excerpt showshow division by zero and occurrences of the contagious NaN value 4 resultin a zero fitness evaluation Comments in Python begin with a characterand continue to the end of the line

Another very common yet potentially problematic arithmetic operation

4The standard Python math library relies on the underlying platformrsquos implementationFor example in cPython (used in this project) the Python standard library module ldquomathrdquois a fairly transparent wrapper on the C standard library header ldquomathhrdquo All operatingsystems used in this project (Microsoft Windows 2000 GNULinux and SGI IRIX) atleast partially implement IEEE standard 754 on binary floating point arithmetic Ofparticular interest IEEE 754 defines two special floating point values called ldquoINFrdquo (-INFis also defined) and ldquoNaNrdquo INF represent any value larger than can be held in the floatingpoint data type INF arises through such operations as the addition or multiplication oftwo very large numbers NaN is the IEEE 754 code for ldquonot a numberrdquo NaN arises fromoperations which have undefined value in common arithmetic Examples include cos(INF)or sin(INF) and INFINF These special values INF and NaN are contagious in thatoperations involving one of these values and any ordinary number evaluates to the specialvalue For example INF - 123456 = INF and NaN 20 = NaN Of these two specialvalues NaN is the most ldquovirulentrdquo since in addition to the above operations involvingNaN and INF evaluate to NaN See httpgrouperieeeorggroups754 for furtherreading on the standard

71

is fractional powers If the square root of a negative value is to be allowedthen a mono-typed GP system with closure must use complex numbers asits one type If one is not interested in evolving complex valued expressionsthen this also must be trapped Again the more robust option is to assignprohibitively low fitness to programs that employ complex values

GP trees that build graphs from a kernel are strongly typed since theremay be several types of modifiable sites on the graph which can only bemodified in certain (possibly non-overlapping) ways For a simple examplea node and an edge may both be modifiable in the kernel yet there are notmany operations that a GP function node could sensibly perform in exactlythe same way on either a node or an edge

39 Genetic programming as indirect search

Genetic programming refers to the use of genetic algorithms to produce com-puter programs An indirect use of genetic programming uses the genetic al-gorithm to find programs that when run produce a solution from the searchspace These programs are judged not on their own structure but on thequality of their output This generalizes the applicability of the techniquefrom simple bit-strings or tree structures to any possible output of a com-puter program in the given programming language There is a computationalcost to all of this indirection however and a risk that it contorts the fitnesslandscape in a way the will slow down progress of the genetic algorithm

A genetic programming system can be arranged that gives each geneticprogram as input a representation of some prototypical solution (called akernel) that incorporates any prior knowledge about the problem If theproblem is system identification then the kernel may be a rough systemmodel arrived at by some other identification technique If the problem issymbolic regression then the kernel may be a low order approximation of thetarget function for example In this arrangement a program is sought whichmodifies the kernel and returns a better solution This makes genetic pro-gramming useful in ldquogrey-boxrdquo system identification problems for examplewhere a preliminary model is already known

72

Figure 38 Genetic programming as indirect search for use in ldquogreyboxrdquosystem identification

310 Search tuning

3101 Controlling program size

Although it is often listed as an advantage of genetic programming thatthere is no formal requirement for a priori knowledge of the size or form ofan optimal or even correct solution there are practical reasons for setting anupper bound on the size of evolved programs

Namely the algorithm must be implemented with finite computationalresources If individuals are allowed to grow without bound then one ormore exceptionally large individuals can quickly consume so much memoryand processor time as to virtually stall the search

Common resource usage limitation options include a fixed population sizeand maximum number of generations which are near ubiquitous in geneticprogramming As to program size a fixed ceiling may be set either as aconstraint on crossover or (as implemented in this project) by assigning zerofitness during program evaluation for individuals that exceed the ceiling

Another program size limiting option is to incorporate program size as anegative factor in fitness evaluation creating a sort of evolutionary ldquoparsi-mony pressurerdquo Here there is a difficulty in deciding the ldquopressure sched-ulerdquo that maps between program size and the penalty factor In particular aschedule that is monotonic and steep tending to penalize large programs andreward small programs to an extent that overwhelms the test-suite based per-formance factor will limit the size and diversity of subprograms available forinclusion in a solution through crossover This extreme case would obviouslyhave a detrimental effect on the performance of the genetic programmingalgorithm Even if the extreme is not reached the very act of designing aldquoparsimony pressure schedulerdquo requires us to estimate the size of a correctsolution to our problem Specifying how big a program must be in orderto solve a given problem will usually be more difficult than picking somegenerous value which is obviously adequate

73

A ldquoparsimony pressurerdquo measure may also be implemented by limiting thecomputational resources (eg memory or processing time) available to eachprogram during evaluation Programs which are not able to complete thefitness evaluation test suite within their assigned resource limits are culledfrom the population While more complex to implement than for example asimple ceiling on program tree size this approach is stronger The strengthof this approach arises when the programs being evolved may incorporateiteration or recursion for which the bounds are a parameter of the search orwhen the programs being evolved are allowed to allocate resources for theirown use None of these aspects apply to the programs of interest in this workhowever and so the simpler approach was taken

It is also interesting to note that the approach of limiting computationresources in general is perhaps more naturalistic Biological organisms whichhave very extensive material needs will on occasion find those needs unmetand consequently perish

3102 Maintaining population diversity

It is a fact of performing a local search on complex error surfaces that lo-cal extrema are sometimes encountered Genetic programming can becomeldquostuckrdquo for considerable time on a non-global fitness maximum if the pop-ulation comes to consist predominately of individuals that are structurallyidentical That is if one solution occurs which is very much more fit thanall others in the current population then reproduction with reselection willcause the next generation to consist primarily of copies of this one individ-ual Although subtree swapping crossover can produce offspring of differingsize and shape from both parents (for example by swapping a large portionof one program for a small portion of another) if both parents are substan-tially or exactly structurally equivalent then the range of possible offspringis greatly reduced In the extreme a population consisting entirely of in-dividuals having exactly the same structure a solution containing functionnodes not present in that structure is not reachable except through mutationwhich is typically employed sparingly

There are many techniques available for population diversity maintenancesuch as tagging or distance measures based limits on the total number ofindividuals per generation sharing any particular structure demes with mi-gration and adaptive crossover or mutation rates (eg make mutation rateproportional to the inverse of some population diversity measure)

74

No active population diversity maintenance techniques were employedbut a measure of population diversity was included in the measured geneticprogramming run aspects This measure employed tagging a sort of struc-tural hashing in which each individual is assigned a ldquotagrdquo (which could be asimple integer) such that structurally equivalent individuals share a tag valuewhile tag values differ for structurally different individuals A test for struc-tural equivalence must be designed for the particular function and terminalsets in use As an example in a symbolic regression system two programtrees might be considered equivalent if their root nodes are equivalent wherethe equivalence of two nodes is defined recursively as follows two nodes areequivalent if they share the same type the same number of children and cor-responding pairs of their children are all equivalent Thus two constant valuenode would be considered structurally equivalent regardless of any differencein their values (a parametric not structural difference) since they have thesame type (constant value) and the same number of children (zero)

It may seem natural to consider any subtree having only constant valueleaf nodes to be structurally equivalent to a single constant value node Aside effect of this would be to bias the search towards exploring towardsexploring structural variations and away from parametric optimization

Tagging is less computationally expensive and conceptually simpler thannumerically valued distance measures A conceptually simple recursive algo-rithm for evaluating structural equivalence is described above This evalua-tion can be made tail-recursive and short circuiting Short circuit evaluation1 is possible in this case because the algorithm will stop recursion and returnfalse immediately upon encountering any two corresponding nodes which arenot equivalent The implementation in Python developed for this projectachieves this automatically since all of the Python built-in logical operatorsare short circuiting A LISP or Scheme implementation would benefit sincemodern optimizing LISP compilers perform very well with tail-recursion

75

Chapter 4

Implementation

In the course of producing this thesis software was written for bond graphmodelling and simulation symbolic algebra and strongly typed genetic pro-gramming This software was written in the Python [83] programminglanguage with a numerical solver provided by the additional SciPy [37] li-braries Visualization tools were written that use Graphviz [25] Pylab [36]and PyGame [72] to show bond graphs genetic trees algebraic structuresnumerical results and animations of multi-body mechanics simulations

41 Program structure

The bond graph modelling and simulation and related symbolic algebra codecode is divided into 4 modules

Bondgraphpy The Bondgraph module defines a set of objects representingbasic and complex bond graph elements bonds and graphs Graphscan be assembled programmatically or prescriptively (from text files)Automatic causality assignment and equation extraction are imple-mented here as are some bond graph reduction routines

Componentspy The Components module contains a collection of com-pound bond graph elements

Algebrapy The Algebra module provides facilities for constructing and ma-nipulating symbolic expressions and equations It produces textualrepresentations of these in Graphviz ldquoDOTrdquo markup or LATEX It im-plements algorithms for common algebraic reductions and recipes for

76

specialized actions such as finding and eliminating algebraic loops fromsets of equations where possible and identifying where it is not possible

Calculuspy The Calculus module provides extensions to the Algebra mod-ule for constructing and manipulating differential equations In partic-ular for reducing coupled sets of algebraic and first order differentialequations to a set of differential equations involving a minimal set ofstate variables and a set of algebraic equations defining all the remain-ing variables in terms of that minimal set

The genetic programming code is divided into 4 modules as well

Geneticspy The Genetics module implements an abstract genetic pro-gramming type system from which useful specializations can be derivedIt implements the genetic operators for tree structured representations

Evolutionpy The Evolution module executes the generational genetic al-gorithm in a generic way independent of the particular genetic repre-sentation in use

SymbolicRegressionpy The SymbolicRegression module implements a spe-cialized genetic programming type system for symbolic regression ondynamic systems on top of the Genetics module

BondgraphEvolutionpy The BondgraphEvolution module implements aspecialized genetic programming type system for evolving bond graphson top of the Genetics module Uses a subset of the symbolic regressionmodule to find constant valued expressions for bond graph elementparameters

A number of other modules were written Most of these automate partic-ular experiments or implement small tools for visualization of bond graphsand genetic trees generating test signals and visualizing numerical results Aframework for distributing genetic programming across a local or wide areanetwork in the island model was also begun but is yet incompleted

411 Formats and representations

Simulation results and test signals used as input to simulations are stored incomma separated value (CSV) text files for portability For particularly long

77

or detailed (small time-step) simulations there is a noticeable delay while theplotting and animation visualization tools parse and load the results A morecompact non-text format might run quicker but the delay is acceptable inexchange for the ability to inspect the results with a text editor and the easeof importing CSV data into off-the-shelf tools such as MATLAB Octaveand even spreadsheet programs

There are several graph definition markup languages in use already in-cluding GraphML (an XML schema) and DOT (used by Graphviz) Anexpedient custom markup format for storing bond graphs was defined aspart of this work The Bondgraphpy module is able to load models fromfiles written in this format and write them back out possibly after some ma-nipulation The distributed computing modules use this format to send bondgraph models across the network It is not the intention that this format be-come widely used but an example is included as listing 41 for completeness

Any line beginning with a is a comment intended for human eyes andignored by the software Nodes and bonds are defined within begin and end

lines Nodes referred to within the definition of a bond must themselves havebeen defined earlier in the file The direction of positive power flow alongbonds is from the tail to the head (ie the half arrow would be drawn at thehead) Bond graphs defined in this way are acausal since Bondgraphpy isable to assign causality automatically when it is needed

This format can be criticized as needlessly verbose For comparisonPython code to create the same model then extract and print the corre-sponding state space equations is included as listing 42

Listing 41 A simple bond graph model in bg format

t i t l e s i n g l e mass sp r ing damper chain

nodes

Spring damper and mass begin node

id spr ingtype Cparameter 1 96

end node

begin node

78

id dampertype Rparameter 0 02

end node

begin nodeid masstype Iparameter 0 002

end node

Common f low point j o i n i n g s dmbegin node

id point1type c f j

end node

Input fo r ce a c t ing on spr ing and damperbegin node

id f o r c etype Separameter Y

end node

bonds

Spring and damper j o i n ed to massbegin bond

t a i l po int1head spr ing

end bond

begin bondt a i l po int1head damper

end bond

79

begin bondt a i l po int1head mass

end bond

begin bondt a i l f o r c ehead point1

end bond

EOF

Listing 42 A simple bond graph model constructed in Python code

from Bondgraph import lowast

g = Bondgraph ( rsquo s i n g l e mass sp r ing damper chain rsquo )c = g addNode ( Capacitor ( 1 9 6 rsquo sp r ing rsquo ) )r = g addNode ( Re s i s t o r ( 0 0 2 rsquo damper rsquo ) )i = g addNode ( I n e r t i a ( 0 002 rsquomass rsquo ) )j = g addNode (CommonFlowJunction ( ) )s = g addNode ( E f f o r tSource ( rsquoE rsquo ) )

g bondNodes ( j c )g bondNodes ( j r )g bondNodes ( j i )g bondNodes ( s j )

g a s s i gnCausa l i t y ( )g numberElements ( )

print StateSpace ( g equat ions ( ) )

80

42 External tools

421 The Python programming language

Python is a highly dynamic interpreted object-oriented programming lan-guage originally written by Guido van Rossum It has been distributed undera series of open source licenses [62] and has thus gained contributions fromhundreds of individuals world-wide Similar to LISP the dynamic and in-trospective features of Python facilitate the creation manipulation and eval-uation of blocks of code in an automated way at runtime all of which arerequired steps in genetic programming

In addition to personal preference and familiarity Python was chosen be-cause it is more legible to a general audience than LISP and faster and lesserror prone than writing programs in C The historical choice of syntax andkeywords for Python has been strongly influenced by the ldquoComputer Pro-gramming for Everyonerdquo philosophy [82] which strives to show that powerfulprogramming environments enjoyed by experts need not be inaccessible toothers The language design choices made to date have been successful inthis respect to the extent that executable Python source code from real ap-plications has occasionally been mistaken for pseudo code Take for examplethe following two functions which implement the universal and existentiallogical qualifiers respectively

Listing 43 Two simple functions in Python

def f o rA l l ( sequence cond i t i on ) for item in sequence

i f not cond i t i on ( item ) return False

return True

def t h e r eEx i s t s ( sequence cond i t i on ) for item in sequence

i f cond i t i on ( item ) return True

return False

These are succinct easily understood by English readers familiar withprocedural programming and due to the name-based polymorphism inher-ent in Pythonrsquos object system [49] these operate on any iterable object

81

ldquosequencerdquo (which may in fact be a list tuple dictionary etc or even a cus-tom object-type implementing the iterator interface) and any callable objectldquoconditionrdquo

Since this research addresses automated dynamic system identificationtopics which are primarily of interest to engineers and not computer scien-tists the ease and accuracy with which an audience of non-specialists in theimplementation language can understand the software is important

In addition to its legibility Python code is portable without change acrossseveral important platforms Development was carried out at times on each ofMac OS X GNULinux and Microsoft Windows with no change to the codebase Later dozens of genetic programming runs were performed with thesame code on a 4 processor SGI machine under the IRIX operating systemPython compiles easily for most Unix variants making Python code portableto some very powerful machines

422 The SciPy numerical libraries

SciPy [37] is a collection of libraries and extensions to the Python pro-gramming language that are likely to of use to scientists and engineersIn this work SciPy is used for fast matrix operations and to access a nu-merical solver Both capabilities are made possible by code from netlib [1]included in SciPy The solver is LSODA an adaptive solver that automati-cally switches between stiff and non-stiff routines It is part of ODEPACK[32] which SciPy exposes as scipyintegrateodeint StateSpace ob-jects from the Calculuspy module conform to the interface provided byscipyintegrateodeint

423 Graph layout and visualization with Graphviz

Graphviz [25] is an open source suit of graph layout and visualization soft-ware It was used in this work to visualize many automatically generatedgraph and tree structures including tree-structured genetic programs bondgraphs and tree-structured equation representations The importance of anautomated graph layout tool is keenly felt when handling large automati-cally generated graphs If Graphviz or Springgraph [21] (another open sourcegraph layout suite it reads the Graphviz DOT file format but supports onlyone layout algorithm) were not available an automated layout tool wouldneed to have been written Instead Bondgraphpy is able to produce a

82

graph definition in the DOT format used by Graphviz For example listing44 is the DOT representation produced by Bondgraphpy from the graphdefined in listing 41 From this Graphviz produces the hierarchical layout(ordered by direction of positive power flow) shown in figure 52

Graphviz DOT files are also produced for genetic trees nested algebraicexpressions and algebraic dependency graphs

Listing 44 Graphviz DOT markup for the model defined in listing 44

digraph rdquo s i n g l e mass sp r ing damper chain rdquo

rankd i r=LRnode [ shape=p l a i n t e x t ]

C1 [ l a b e l=rdquoC spr ing rdquo ] R2 [ l a b e l=rdquoR damper rdquo ] I3 [ l a b e l=rdquoI mass rdquo ] 11 [ l a b e l =rdquo1rdquo] Se4 [ l a b e l=rdquoSe f o r c e rdquo ]

edge [ f o n t s i z e =10] 11 minusgt C1 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo1rdquo] 11 minusgt R2 [ arrowhead=ha l f a r r owta i l=tee l a b e l =rdquo2rdquo] 11 minusgt I3 [ arrowhead=te eha l f l a b e l =rdquo3rdquo] Se4 minusgt 11 [ arrowhead=te eha l f l a b e l =rdquo4rdquo]

43 A bond graph modelling library

431 Basic data structures

Formally a graph is a set of nodes and a set of edges between the nodesBondgraphpy implements a bond graph object that maintains a list of el-ements and another of the bonds between them Each element has list ofbonds and counts of the maximum number of causal-in and causal-out bondsallowed Each bond has head and a tail (references to elements within thebond graph) and a causality (initially nil but assigned equal to the head orthe tail during sequential causality assignment) Activated bonds are a sub-class of bonds Activated bonds are never assigned a causality Signal blocksare elements that only accept activated bonds at their ports

83

432 Adding elements and bonds traversing a graph

The graph class has an addNode and a removeNode method a bondNodesand a removeBond method Nodes are created independently then addedto a graph The addNode graph method checks first that the node is notalready part of this or any other graph and that it has no attached bondsBonds are created only within the context of a particular graph by callingthe bondNodes graph method This method takes two elements (which areto become the tail and the head of the bond respectively) checks that bothelements are part of this graph and can accommodate an additional bondthen joins them and returns a reference to the new bond

Each element maintains a list of attached bonds Each bond has a ref-erence to its head and tail elements In this way it is possible to walk thegraph starting from any element or bond For convenience the bond graphobject has methods to iterate over all elements all bonds or a filtered subsetthereof

433 Assigning causality

The sequential causality assignment procedure (SCAP) described in Sec-tion 232 is automated by the bond graph code This procedure is in-voked by calling the assignCausality method on a bond graph object AnunassignCausality method exists as well which strips any assigned causal-ity from all bonds in the graph

The assignCausality method implementation begins by segregating thegraph elements into four lists

1 one-port elements with a required causality (source elements Se and Sfswitched storage element CC)

2 one-port elements with a preferred causality (storage elements C andI)

3 one-port elements with arbitrary causality (sink element R)

4 others (0 1 GY TF signal blocks)

As a side effect of the way bond graph elements are stored and retrievedelements appear in these lists in the order that they were added to the graph

84

For each element in the source element list the attached bond is assignedthe required causality Each time a bond is assigned causality the conse-quences of that assignment if any are propagated through the element atthe other end of the bond Causality propagates when

bull a bond attached to a TF or GY element is assigned either causality

bull a bond attached to a 0 element is assigned effort-in (to the 0 junction)causality

bull the last-but-one bond attached to a 0 element is assigned effort-out(from the 0 junction) causality

bull a bond attached to a 1 element is assigned effort-out (from the 1 junc-tion) causality

bull the last-but-one bond attached to a 1 element is assigned effort-in (tothe 1 junction) causality

The rules for causality propagation are implemented in the extendCausalitymethod of each element object type so the propagation of causality assign-ment consequences is a simple matter of looking up the element at the otherend of the just-assigned bond and invoking its extendCausality methodThat element object itself is then responsible for examining the causalityassignment of all attached bonds making any new assignments that maybe implied and invoking extendCausality on its neighbours after any newassignments

If as a consequence of the assignment of a required causality anothercausality requirement is violated then the extendCausality method on thesecond element will raise an exception and abort the original invocation ofassignCausality This could occur if for example two like source elementsare bonded directly together

After all required causality one-port elements have been handled thepreferred causality list is examined Any elements whose attached bondhas been assigned a causality preferred or otherwise as a consequence ofearlier required or preferred causality assignments is skipped over Othersare assigned their preferred causality and consequences are propagated

Finally the list of arbitrary causality elements is examined Again anywhose attached bond has been assigned a causality are skipped over Therest are assigned a fixed causality (effort-in to R elements) and consequences

85

are propagated If any non-activated bonds remain without a causality as-signment these are given an arbitrary assignment (effort in the direction ofpositive power transfer) Once all bonds have been assigned a causality thelist of preferred causality elements is re-examined and any instances of dif-ferential causality are noted in a new list attached to the bond graph object

434 Extracting equations

All bond graph element objects have an equations method that when in-voked returns a list of constitutive equations for that element This methodfails if the attached bonds have not yet been assigned a causality and anumber Bonds are assigned a causality either manually or automatically asdescribed in the previous section Bonds are assigned a number in the orderthey were added to the graph by invoking the numberBonds method of thebond graph object Bond numbers are used to name effort and flow variablesof each bond uniquely Bond graph objects also have an equations methodIt returns an equation set object containing all the constitutive equations ofall the elements in the graph It does this by first invoking assignCausality

and numberBonds on the graph object itself then invoking the equations

method on each element and collecting the results Bond graph element ob-jects will in general return different constitutive equations depending onthe causality assignment of the attached bonds Elements with a requiredcausality such as Se Sf and CC will raise an exception and abort the collec-tion of equations if their equations method detects other than the requiredcausality since the constitutive equations of these elements are not definedin that case

Equations the algebraic expressions which they contain and the equationsets into which they are collected are all implemented by the symbolic algebralibrary described in Section 44

435 Model reductions

There are certain structural manipulations of a bond graph model that donot change its behaviour in simulation These may be applied to reduce thecomplexity of a model just as the rules of algebra are applied to reduce com-plex expressions to simpler ones Five such manipulations are automated bythe bond graph library and all bond graph objects built by the library have areduce method that applies them in the order given below It is advisable to

86

invoke the reduce method before extracting equations for simulation since itis simple (computationally inexpensive) to apply them Although detectingwhere these reductions are applicable requires an examination of every junc-tion in the graph and all the junctionrsquos neighbours that is outweighed by thealgebraic manipulations it obviates These bond graph reductions producea model with fewer constitutive equations and often if there are loops withfewer algebraic loops The latter is a particularly large gain since resolvingalgebraic loops (the implementation of the procedure for which is describedin Section 443) involves extensive manipulations of sets of equations thatmay be very large The implemented bond graph reductions are

1 Remove junctions with less than 3 bonds

2 Merge adjacent like junctions

3 Merge resistors bonded to a common junction

4 Merge capacitors bonded to a common effort junction

5 Merge inertias bonded to a common flow junction

These five model reduction steps are illustrated in figures 41 to 45In the first manipulation any 0 or 1 junction having only one bond is

removed from the model along with the attached bond Any junction havingonly two bonds is likewise removed along with both its bonds and a single newbond put in their place For this and the second manipulation the equiv-alence of the model before and after follows directly from the constitutiveequations of the 0 and 1 junctions elements

In the third manipulation some number of R elements all bonded tothe same junction and removed and a single new R element is put in theirplace For elements with the linear constitutive equation given by equation41 there are two cases depending on the junction type Handling of non-linear elements is not automated by the library If the attached junction is a1 element the resistance parameter of the new R element is simply the sumof the parameters of the replaced elements If the attached junction is a 0element the resistance parameter of the new R element is req where

req =1

1

r1

+1

r2

+ +1

rn

(41)

87

These are the familiar rules for electrical resistances in series and parallelThe fourth and fifth manipulations handle the case shown in 212 and an

even simpler case than is shown in 213 respectively The same electricaland mechanical analogies apply Capacitors in parallel can be replaced by asingle capacitor have the sum of the replaced capacitances Rigidly joinedinertias are effectively one mass

To illustrate the application of the model reduction rules described abovefigure 46 shows a randomly generated bond graph model that contains junc-tions with less than 3 bonds adjacent like junctions and multiple resistorsbonded to a common junction Figures 47 48 and 49 show the applicationof the first 3 steps in the automated model reduction method The 4th and5th are not applicable to this model Listing 45 gives the collected consti-tutive equations of the original model (figure 46) and figure 410 shows thealgebraic dependencies between them Listing 46 gives the collected consti-tutive equations of the reduced model (figure 49) and figure 411 shows thealgebraic dependencies between them

X 13

Y4

X 01 Y2

X Y3

X Y1

Figure 41 Model reductions step 1 remove trivial junctions

88

W

1

6

X

7

Y

Z

18

9

10

W

0

1

X 2

Y

Z

034

5

W1

6

X

7

Y

Z

9

10

W

0

1

X

2

Y

Z

4

5

Figure 42 Model reductions step 2 merge adjacent like junctions

89

X 16

Y

7

Rr18

Rr2

9

Rr3

10

X 01

Y

2

Rr13

Rr2

4

Rr3

5

X 16 Y7

R(r1+r2+r3)

8

X 01

Y2

Rr_eq3

Figure 43 Model reductions step 3 merge resistors bonded to a commonjunctions

90

X 01

Y

2

Cc13

Cc2

4

Cc3

5

X 01

Y2

C(c1+c2+c3)

3

Figure 44 Model reductions step 4 merge capacitors bonded to a commoneffort junctions

X 11

Y

2

Ii13

Ii2

4

Ii3

5

X 11

Y2

I(i1+i2+i3)

3

Figure 45 Model reductions step 5 merge inertias bonded to a commonflow junctions

91

SeSe4

1

4

II1

1

1

RR15

15

0

2

RR14

14

0

0

9

RR11

11

II12

12

RR13

13

1

3

0

0

7

5

RR6

6

8

10

Figure 46 A randomly generated model containing trivial junctions adja-cent like junctions and multiple resistors bonded to a common junction

92

SeSe4

1

3

II1

1

1

RR15

9

0

2

RR14

8

0

11

RR11

5

II12

6

RR13

7

10

RR6

4

Figure 47 The model from figure 46 with trivial junctions removed

93

SeSe4

1

2

II1

1

1

RR15

7

0

9

RR11

4

II12

5

RR13

6

RR14

10

8

RR6

3

Figure 48 The model from figure 46 with trivial junctions removed andadjacent like junctions merged

94

SeSe4

1

2

II1

1

1

RR15

5

0

7

II12

4

RReq

8

6

RR6

3

Figure 49 The model from figure 46 with trivial junctions removed ad-jacent like junctions merged and resistors bonded to a common junctionmerged

95

f13

f3

e2

e1

p1

p12

f12

f15

e15

e11

f11

f8

f7

f1

f9

e5

e7

f6

e6

f5

f4

e4

e12

f2

e3

e9 e13

f14

f10

e8

e10

e14

i0

Figure 410 Algebraic dependencies within equations extracted from themodel of figure 46 and listing 45

96

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

e8

f7

f8

i0

Figure 411 Algebraic dependencies within equations extracted from thereduced model of figure 49 and listing 46

97

Listing 45 Equations from the model in figure 46

e1 = ( e2+(minus10lowast e15 ) )e2 = e10e3 = e8e4 = i0e5 = ( e4+(minus10lowast e6 ) )e6 = ( f6 lowast32 0 )e7 = e5e8 = e7e9 = e3e10 = e9e11 = e3e12 = e3e13 = e3e14 = e10e15 = ( f15 lowast45 0 )f 1 = (p1minus780)f 2 = f1f3 = ( f9+f11+f12+f13 )f 4 = f5f5 = f7f6 = f5f7 = f8f8 = f3f9 = f10f10 = ( f2+f14 )f11 = ( e11 45 0 )f12 = ( p12 00429619641442)f13 = ( e13 713788689084)f14 = ( e14 93 0 )f15 = f1p1 dot = e1p12 dot = e12

98

Listing 46 Equations from the reduced model in figure 49

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( e2+(minus10lowast e3 ) )e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

99

44 An object-oriented symbolic algebra li-

brary

An object-oriented symbolic algebra library was written as part of this workIt is used by the bond graph library to produce equations in a modularway depending on causality assignment and by the symbolic regression codefor similar reasons It is capable of several simple and widely applicablealgebraic manipulations as well as some more specialized recipes such asreducing equations produced by the bond graph library to state space formeven if they contain algebraic loops

441 Basic data structures

The library implements three basic objects and specializations thereof analgebraic expression an equation and a equation set

Expression objects can be evaluated given a table in which to look for thevalue of any variables they contain Expression objects have methods thatlist any variables they contain test if the expression is constant (containsno variables) list any nested sub-expressions (each variable constant andoperator is an expression in itself) replace any one sub-expression with an-other and other more specialized algebraic manipulations Base expressionobjects are never used themselves leaf nodes from the object inheritancegraph shown in figure 412 are used That figure indicates that Operator isa type of Expression and Div (division) is a type of Operator Operator

has all the attributes of Expression but may modify or add some and so onfor CommutativeOperator and all of the practical objects (leaf nodes in thegraph) Each type of practical object accepts some number of sub-expressionswhich may be of the same or any other type Constant and Variable objectsaccept no sub-expressions Cos and Sin operators accept one sub-expressioneach Div Greater and Lesser accept two ordered sub-expressions Thecommutative operators Add and Mul accept any number of sub-expressionsthe order of which is immaterial

Equation objects are composed of two expression objects (a left-handside and a right) and methods to list all variables expressions and sub-expressions in the equations replace any expression appearing in the equationwith another as well as several more elaborate algebraic tests and manipu-lations described in the sections that follow

100

Operator

Expression

Variable Constant

Div Greater Lesser CommutativeOperator

Add Mul

Sin Cos

Figure 412 Algebraic object inheritance diagram

Equation sets are collections of equations that contain no duplicatesEquation set objects support similar methods to equations and expressions(list all variables replace one expression with another across all equationsin the set etc) as well as a number of methods that are specific to collec-tions of equations Those include recursive backsubstitution of the equationsfrom the set into any other equation provided the original set is in a certainldquodeclarativerdquo form detection of this form detection of algebraic loops andresolution of algebraic loops Easily the most complex manipulation that isautomated by the library algebraic loops are discussed in section 443

The Calculuspy module extends the basic algebra library (Algebrapy)with a Diff operator and a type of equation set called StateSpace AStateSpace object is constructed from an ordinary equation set by a processdescribed in Section 251 Diff takes a single sub-expression and is a place-holder for first order differentiation of that expression with respect to time(a variable called ldquotrdquo is always assumed to exist in any state space equationset)

101

442 Basic reductions and manipulations

Expression objects support the following basic manipulations

bull Substitutions (search and replace)

bull Get list of variables

bull Aggregate constants

bull Collapse constant expressions

bull Convert constant divisors to multiplication

bull Aggregate commutative operators

bull Remove single operand commutative operators

bull Expand distributive operators

Substitutions and getting a list of variables are self-explanatory ldquoAg-gregate constantsrdquo replaces multiple Constant objects that appear as directsub-expressions of a commutative operator with one equivalent Constant ob-ject ldquoCollapse constant expressionsrdquo detects expressions that contain novariables evaluates them on the spot and replaces the entire expression withone Constant object

ldquoConvert constant divisors to multiplicationrdquo detects Div objects thathave a Constant object as their second sub-expression (the divisor) and re-places the Div with a Mul and the Constant with a new Constant whose valueis the reciprocal of the original ldquoAggregate commutative operatorsrdquo detectsoccurrences of a commutative operator within the direct sub-expressions ofa like operator It joins them into one like operator having all the sub-expressions of both

CommutativeOperator objects accept any number of sub-expressions (theoperands) ldquoRemove single operand commutative operatorsrdquo detects occur-rences of a commutative operator with just one operand and replaces themwith their one sub-expression CommutativeOperator objects with only onesub-expression are null or ldquodo nothingrdquo operations A single operand Add isassumed to add zero to its operand and a single operand Mul is assumed tomultiply by one These may arise after aggregating constants if the operator

102

had only constant sub-expressions Removing them allows other manipula-tions to proceed

ldquoExpand distributive operatorsrdquo detects a Mul object with at least onesub-expression that is an Add object It selects the first such sub-expressionit finds and replaces the Mul object with a new Add object The operandsto this new Add object are Mul objects each of which has as operands one ofthe operands from the original Add object as well as all of the operands ofthe original Mul object except for the original Add object

Expression objects also support the following more specialized manipu-lations which make use of the above basic manipulations

bull Drive expression towards polynomial (iterative)

bull Collect like terms (from a polynomial)

ldquoDrive expression towards polynomialrdquo is a sequence of basic manipula-tions that when applied repeatedly is a heuristic for putting expressions intopolynomial form Polynomial form is defined here to mean an Add objectthe operands of which are all either Constant Variable or Mul objectsIf they are Mul objects then they have only two operands themselves oneof which is a Constant and the other of which is a Variable object Notall expressions can be manipulated into this form but the isPolynomial

method on Expression objects detects when it has been achieved ldquoDriveexpression towards polynomialrdquo is primarily used while resolving algebraicloops in equation sets extracted from linear bond graph models in which casethe preconditions exist to make a polynomial form always achievable ThetowardsPolynomial method on Expression objects applies the followingsequence of operations each time it is invoked

1 Aggregate constants

2 Convert constant divisors

3 Aggregate commutative operators

4 Expand distributive operators

Equation objects support the following manipulations which make use of theabove Expression object manipulations

bull Reduce equation to polynomial form

103

bull Collect like terms (from a polynomial form equation)

bull Solve polynomial form equation for any variable

ldquoReduce equation to polynomial formrdquo invokes towardsPolynomial on theleft and right-hand side expressions of the equation repeatedly until isPolynomialgives a true result for each ldquoCollect like termsrdquo examines the variables inthe operands of the Add objects on either side of a polynomial form equa-tion and collects them such that there is only one operand between thoseAdd objects that contains any one variable The result may not be in thepolynomial form defined above but can be driven there again by invoking theequationrsquos towardsPolynomial method Equation objects have a solveFor

method that given a variable appearing in the equation attempts to putthe equation in polynomial form then collects like terms isolates the termcontaining that variable on one side of the equation and divides through byany constant factor Of course this works only for equations that can be putin the polynomial form defined above The solveFor method attempts toput the equation in polynomial form by invoking the towardsPolynomial

repeatedly stopping either when the isPolynomial method gives a true re-sult or after a number of iterations proportional to the size of the equation(number of Expression objects it contains) If the iteration limit is reachedthen it is presumed not possible to put the equation in polynomial form andthe solveFor method fails

EquationSet objects support the following manipulations which makeuse of the above Equation and Expression object manipulations

bull Detect and resolve algebraic loop(s)

bull Recursive backsubstitution from declarative form equations

bull Reduce equation set to state space form

These are described in detail in the next two sections

443 Algebraic loop detection and resolution

Equation sets extracted from bond graph models sometimes contain algebraicloops in which dependencies of a variable include itself Here the dependen-cies of a variable are defined to be the variables on the right-hand side ofits declarative form equation plus the dependencies of all of those Recall

104

that equations extracted from a bond graph are all in declarative form Al-gebraic loops if not identified and resolved will cause infinite regression inthe process of reducing equations to state space form by back-substitutionof variables declared by algebraic equations into the differential equationsas described in Section 251 Loops in the dependency graph which passthrough a differential operator are not problematic to this process Thesedifferential-algebraic loops encode the linkage among state variables and neednot be resolved before hand

Algebraic loops do not represent any general physical attributes of themodel as do the causal linkages described in Section 231 (figures 212 and212) They are simply computational inconveniences Fortunately for linearsystems (bond graph models containing only the standard elements describedin Section 22) it is always possible to resolve (break) algebraic loops andthe process has been automated as part of this work Less fortunately theprocess is not easily generalized to non-linear systems which may containcomplex trigonometric expressions and non-invertible functions

The process in brief is to identify algebraic loops with a depth-firstsearch of the dependency graphs The search is started once from each node(variable) in the graph that appears as the left-hand side of an algebraicdeclarative equation Once identified a purely algebraic loop is resolved by

1 Traversing the loop and collapsing the declarative equations for all loopvariables into one equation by back substitution into the declarativeequation for the first loop variable

2 Expanding distributive operators aggregating constants and commu-tative operators

3 Collecting like terms

4 Isolating the first loop variable

The result of the first step is a new declarative equation for the first loopvariable which contains only the variable itself and non-loop variables on theright-hand side It is always possible by applying the operations listed inthe second step to reduce the right-hand expression of the collapsed loopequation to the form of a sum of products where each product is between avariable and a constant This is possible because the algebraic declarativeequations extracted from bond graph models have one of only two forms

A = B middot C

105

and

D = E1 + E2 + middot middot middot+ En

where C is a constant and any of E1 En may be prefixed by a negativesign The third and fourth steps bring the product containing the first loopvariable out of the sum and onto the right-hand side then factor out thevariable and divide through by the constant part

The result of this process is a new declarative equation for one loop vari-able in terms only of variables from outside the loop This equation is usedto replace the original declarative equation in the set breaking the loop Forsets of equations that contain multiple (and possibly overlapping) algebraicloops the process can be applied repeatedly to break one loop at a timesubstituting the result back into the set before searching again for loops Itis important that this is done sequentially since attempting to solve all loopsare once before substituting the collapsed and refactored equations back intothe set often results in an equal number of different loops in the now changedequation set

Listing 47 outlines the process for the loop in the reduced model shownin figure 49 The resulting set of equations is given in listing 48 Thedependency graph for those equations is given in figure 413 and shows noremaining algebraic loops The entire process is automated

Listing 47 Resolving a loop in the reduced model

loop [ rsquo e6 rsquo rsquo e8 rsquo rsquo f8 rsquo rsquo f6 rsquo rsquo f3 rsquo rsquo e3 rsquo ]e8 = e6f8 = ( e8 30197788376)e3 = ( f3 lowast32 0 )f 3 = f6f6 = ( f4+f7+f8 )e6 = ( e2+(minus10lowast e3 ) )

c o l l a p s i n ge6 = ( e2+(minus10lowast(( f 4+f7+(e6 30 197788376 ) )lowast32 0 ) ) )

expanding and aggregat ingi t e r a t i o n s 3 True Truee6 = ( e2+( f4 lowastminus320)+( f7 lowastminus320)+( e6 lowastminus105968025213))

c o l l e c t i n g

106

( e6+(minus10lowast( e6 lowast minus105968025213)))= ( e2+( f4 lowastminus320)+( f7 lowastminus320))

expanding and aggregat ingi t e r a t i o n s 2 True True( e6+(e6 lowast1 05968025213)) = ( e2+( f4 lowastminus320)+( f7 lowastminus320))

i s o l a t i n ge6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)done

Listing 48 Equations from the reduced model with loops resolved

e1 = ( e7+(minus10lowast e5 ) )e2 = i0e3 = ( f3 lowast32 0 )e4 = e6e5 = ( f5 lowast45 0 )e6 = ( ( ( e2 lowast10)+( f4 lowastminus320)+( f7 lowast minus320))205968025213)e7 = e6e8 = e6f1 = (p1minus780)f 2 = f6f3 = f6f4 = (p4 00429619641442)f 5 = f1f6 = ( f4+f7+f8 )f 7 = f1f8 = ( e8 30197788376)p1 dot = e1p4 dot = e4

444 Reducing equations to state-space form

Equations generated by the bond graph modelling library described in Sec-tion 43 always appear in ldquodeclarative formrdquo meaning that they have on theleft-hand side either a single bare variable or the first time derivative of avariable Moreover no variable ever appears on the left-hand side of more

107

e4

p4

p1

f1

e2

e6

f4

f6

e1

e7

f5

e5

f3

e3

f2

f7

f8

e8

i0

Figure 413 Algebraic dependencies within the reduced model with loopsresolved

108

than one equation These two properties are crucial to the procedure forreducing sets of equations to state space form If either condition does notexist (for example because the bond graph model has integral causality forsome elements) then the procedure is aborted After checking that all equa-tions are in declarative form checking for algebraic loops and resolving anythat appear the procedure has three steps

1 Identify state input and other variables

2 Back-substitute until only state and input variables on right-hand side

3 Split state and readout equation sets

State variables are those that appear within a differential These are theenergy variables from each storage element in the bond graph model Thevalue of these variable at any point in time encodes the state of the systemthat is the precise distribution of energy within it Since these variablestogether entirely describe the state of the system a vector formed by thempositions the system within a state-space The state-space form being soughtis one in which a numerical integrator can be used to solve for the value ofthis state vector over time That is for the systems trajectory through statespace

Input variables are those that do not appear on the left-hand side of anydeclarative form equation Since there is no explicit definition for them inthe equation set and there will not be one in the reduced set either thesevariables represent inputs to the system Other variables are those thatappear on the left-hand side of a purely algebraic (non-differential) equationThese are parameters of the system that can later be solved for in terms ofthe state variables

In the second step the right-hand side of non-differential equations issubstituted into the right-hand sides of the all equations wherever the left-hand side variable of those non-differential equations appears This continuesrecursively until all equations have only input and state variables on theirright-hand sides If the equation set contains algebraic loops then this pro-cess will never terminate so it is very important to detect and resolve andalgebraic loops before attempting reduction to state space form

The result of this recursive back-substitution is a set of declarative formequations some differential and some purely algebraic and all having onlystate and input variables on the right-hand side The third step is to segregate

109

the differential equations from the purely algebraic ones The differentialequations are in declarative form for the state vector and will be passedto a numerical integration routine to find the system trajectory in statespace given a sequence of input variable values over time The remainingpurely algebraic equations can be used to solve for any of their left-hand sidevariables given the system trajectory and the input sequences

445 Simulation

Once a state space equation set has been constructed it can be solved overany range of time for which a sequence of input values is available Au-tonomous system with no inputs can be solved for arbitrary time rangesof course The differential equations from a state space equation set aresolved by numerical integration using the SciPy [37] provided interface tothe LSODA [32] integrator The integration routine solves for the systemtrajectory in state space by integrating both sides of the state variable declar-ative form differential equations This is the fundamental reason why integralcausality is preferred to differential causality in bond graph models Numer-ical differentiation with a centred difference implies knowledge of the futurewhereas numerical integration is a simple summing up over the past

Inputs if any are read from a file then interpolated on-the-fly to allowthe integration routine to evaluate the input at any point within the availablerange of simulation time Zeroth order (constant) and a first order (linear)interpolators are implemented The zeroth order hold function was used forall simulation results presented in chapter 5

Once the system trajectory is in hand it and the interpolated inputsare used to solve the remaining algebraic equations across the same rangeof simulated time for any interesting non-state variables This is a straight-forward application of the algebraic expression solving capabilities describedin previous sections

45 A typed genetic programming system

A strongly typed genetic programming system requires extra attention tothe design of the crossover and mutation operators These operators mustensure that the type system is adhered to so that the result of any operationis still a valid program in the language

110

The design described here isolates the crossover and mutation operatorsfrom any details of the problem being solved by the genetic algorithm Tosolve a different class of problem (eg graph-based modelling instead of sym-bolic regression) a new set of program node types is implement That is thegrammar for a different genetic programming language is defined Each newnode type derives from a base class implementing the crossover and mutationmechanisms in a generic way To interoperate with those mechanisms eachnode type must have the following attributes

type a unique type identifier

min children an integer the minimum number of children it will accept

max children an integer the maximum number of children it will accept

child types a list of types at least max children long

Each node must also provide a method (function) that may be invokedto compose a partial solution to the problem from the results of invoking thesame method on its children (the nodes below it in the program tree) Theldquocomposed partial solutionrdquo returned by the root node of a genetic programis taken to be a solution to the whole problem (eg a set of differential-algebraic equations or a complete bond graph)

Optionally a kernel of a solution may be passed in to the root node whenit is invoked That kernel is a prototype solution to the problem incorporatingwhatever a priori knowledge the experimenter has It might be a set ofequations that form a low order approximation of the system behaviour oran idealized graph model or a graph model that is highly detailed in partsthat have been identified by other methods and sketched in very simply inother parts The invoked method of each node may pass all or part of thekernel on to its children when calling them use all or part of the kernel whencomposing its result or even ignore the kernel entirely (removing part of theoriginal kernel from the ultimate result) These details are specific to thedesign of the node types with a particular genetic programming languagegrammar and the overall genetic programming framework is not affectedby these choices For example the genetic programming type system forsymbolic regression presented in section 453 does not use a kernel while thesystem for bond graphs presented in section 454 does use a kernel in theform of a bond graph model

111

451 Strongly typed mutation

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Choose a node at random from the parent

2 Generate a new random tree whose root has the same type as the chosennode

3 Copy the parent but replace the subtree rooted at the chosen node withthe new random tree

In the first step a choice is made uniformly among all nodes It is notweighted by the position of the node within the tree or by the type of thenode In the second step it is always possible to generate a tree rootedwith a node of the same type as the chosen node since all nodes in the parent(including the chosen node) come from the function and terminal set of whichthe replacement tree will be built In the third step of the mutation operatorfor strongly typed tree structured representations the parent tree is copiedwith replacement of the chosen subtree

An attempt is made also to create the new tree at approximately the samesize (number of nodes) and depth as the subtree to be replaced The tree isconstructed breadth first Each time a new node is added the required typeis first checked and a list of functions and terminals conforming to that typeis assembled A function or terminal is chosen from the set at random withfunctions preferred over terminals by a configurable ratio If the tree exceedseither the size or depth of the tree to be replaced then terminals are usedwherever possible (ie a function will only be used if no terminal conformsto the required type) For the type systems (genetic programming languagegrammars) described in sections 453 and 454 this most often results in notmore than one more layer of leaf nodes being added to the tree This worksout so well because the tree is constructed breadth first and in these typesystems there are very few function types that accept only other functiontypes as children and these are closely tied to the one type used as the rootof a full genetic program so their children all tend to be filled in before thetree hits a size or depth limit

The tree creation procedure described here is also used to create randomindividuals at the start of a genetic programming run In that case the size

112

and depth limits imposed on the initial population are arbitrary and left tothe experimenter to chose for each run

452 Strongly typed crossover

The implemented mutation operator for strongly typed tree structured rep-resentations proceeds in three steps

1 Find common node types by intersecting the node type sets of theparents

2 List all nodes from the first parent conforming to one of the commontypes

3 Choose at random from the list

4 List all nodes from the second parent conforming to the same type asthe chosen node

5 Choose at random from the list

6 Copy the first parent but replace the subtree rooted at the first chosennode with a copy of that rooted at the second

If the set of common types is empty crossover cannot proceed and the geneticalgorithm selects a new pair of parents

453 A grammar for symbolic regression on dynamicsystems

This section describes some challenges encountered while designing a geneticprogramming grammar (type system) for symbolic regression targeted atmodelling dynamic systems and presents the finished design

Symbolic regression refers to the use of genetic algorithms to evolve asymbolic expression that fits some given sampled data In contrast to linearregression and other types of curve fitting the structure of the expressionis not specified before hand The grammar for this conventional form ofsymbolic regression consists of random constant values a node representingthe independent variable and standard algebraic operators (with the possiblereplacement of division by an operator protected against division by zero)

113

This grammar is typeless or ldquomono-typedrdquo The result of symbolic regressionis a tree structured algebraic expression a scalar function of one variableThe genetic programming grammar for bond graphs given in section 453uses this conventional form of symbolic regression with the omission of anyindependent variable nodes to evolve constant valued expressions for anyneeded parameters of graph elements

Symbolic regression as a technique for identifying dynamic systems presentsadditional challenges The system under test may by multi-input and multi-output and being dynamic requires that the regressed symbolic model havesome representation of state The chosen (and obvious perhaps) represen-tation is a set of equations in state space form such as those extracted andreduced from a bond graph model with all integral causality (see section444)

Recall that a state space equation set consists of two subsets The stateequations are first order differential equations The differential operands onthe left-hand side of these equations make up the state vector The readoutequations are a set of declarative form purely algebraic equations that definea number of output variables in terms of the state and input variables For ablack-box system identification problem it is desired to specify the number ofinputs and the number of outputs but nothing further The number of statevariables (length of the state vector) should not need to be pre-specified Itis simple enough to draw up a tree-structured representation of a state spaceequation set but the desire to not pre-specify the number of states requiresthat the representation contain some facility by which the genetic algorithmcan vary the number and interconnection of the state equations in the courseof applying the usual genetic operators (crossover and mutation)

In addition the operation that adds a new state equation should disruptas little as possible any existing structure in a partially successful programSimply making the new state available for later integration into the modelis enough Given a tree structured representation in which a ldquostate spaceequation setrdquo sits above a ldquostate equation setrdquo and a ldquoreadout equation setrdquoeach of which sit above a number of equations there are several ways inwhich new state equations might be added to the ldquostate equation setrdquo

1 A new ldquoblankrdquo state equation could be added where ldquoblankrdquo meansfor example that the right-hand side is constant

2 A new random state equation could be added where the right-hand sideis a random expression in terms of input and state variables

114

3 An existing state equation could be copied leaving the copies untiedand free to diverge in future generations

These all leave open the question of how the new state variable is linked(eventually) into the existing set of state equations Also if a state equationis removed (because a crossover or mutation operator deletes that part ofthe program tree) there is the question of what to do with references to thedeleted state variable within the other state equations and within the readoutequations

The implemented solution assigns to each state variable an index on someinterval (a real number between 0 and 1 for example) References to statevariables made from the right-hand side of state and readout equations do notlink directly to any program object representing the state variable Ratherthey link to an index on the interval The indices assigned to state variablesand indices assigned to references to state variables need not align exactlyWhen it comes time to evaluate the program references to a state on theright-hand side of any equations are resolved to state variables defined onthe left-hand side of state equations by starting from the reference index andsearching in one direction modulo the length of the interval until the firststate variable index is encountered

When a new random state equation is introduced (eg through mutation)it is given a random index on the interval When a new state equation isintroduced by copying another (eg through crossover) it gets exactly thesame index as the original When the program is evaluated immediatelyprior to resolving state variable references the duplicate state variable indicesare resolved as follows Starting from one end of the interval and proceedingin the opposite direction to the variable resolution search any duplicate indexis replaced by a number half way between its original index and the next stateequation index (with wrap around at the interval boundaries) Overall thisdesign has some desirable properties including

bull Random expressions can be generated that refer to any number of statevariables and all references will resolve to an actual state variable nomatter how many states the program describes at this or any laterpoint in the genetic programming run

bull Adding an identical copy of a state equation has no affect until theoriginal or the copy is modified so crossover that simply adds a stateis not destructive to any advantages the program already has

115

bull When one of those equations is modified the change has no effect onparts of the system that were not already dependent on the originalstate equation (but half on average of the references to the originalstate variable become references to the new state variable)

The second and third points are important since other designs consideredbefore arriving at this one such as numbering the state equations and usingvery large numbers modulo the final number of equations to reference thestate variables would reorganize the entire system every time a new stateequation was introduced That is every change to the number of states wouldbe a major structural change across the entire system even if the new stateis not referenced anywhere in the readout or other state equations This newscheme allows both small incremental changes (such as the introduction ofa new state equation by copying and later modifying in some small way anold one) and large changes (such as part or all of the state equation set beingswapped with an entirely different set) both through the standard crossoveroperation

The type system is illustrated in table 453 The first column gives aname to every every possible node The second column lists the type towhich each node conforms This determines which positions within a validprogram that node may legally occupy The third column lists the types ofthe new positions (if any) that the node opens up within a program Theseare the types that nodes appearing below this one in a program tree mustconform to

Every program tree is rooted at a DynamicSystem node The program isevaluated by calling on the root node (a DynamicSystem instance) to producea result It will return a state space equation set using the symbolic algebralibrary described earlier in this chapter (ie the same implementation usedby the bond graph modelling library) The DynamicSystem node does thisby first calling on its children to produce a set of state and readout equationsthen resolving variable references using the index method described aboveand finally combining all the equations into one state space equation setobject StateEquationSet and OutputEquationSet nodes produce theirresult by calling on their first child to produce an equation and their secondchild if present to produce a set of equations then returning the aggregate ofboth StateEquation and OutputEquation nodes call on their own childrenfor the parts to build an equation compatible with the simulation code andso on

116

Node Type Child typesDynamicSystem DynamicSystem StateEquationSet OutputEquationSet

StateEquationSet StateEquationSet StateEquation StateEquationSetStateEquation StateEquation StateVariableLHS Expression

OutputEquationSet OutputEquationSet OutputEquation OutputEquationSetOutputEquation OutputEquation OutputVariable Expression

StateVariableLHS StateVariableLHSStateVariableRHS Expression

OutputVariable OutputVariableInputVariable Expression

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression ExpressionCos Expression ExpressionSin Expression Expression

Greater Expression Expression ExpressionLesser Expression Expression Expression

Table 41 Grammar for symbolic regression on dynamic systems

117

454 A grammar for genetic programming bond graphs

This section presents a genetic programming grammar (type system) that canbe used to generate bond graph models including any constant parametersof the elements Used for system identification this does combined structuraland parametric identification Genetic programs from this language are eachrooted at a Kernel node This node embeds an ordinary bond graph modelsome elements of which have been marked as replaceable by the program Thegrammar as given only allows bonds and constant parameters of elementsto be replaced but could be extended to other elements or even connectedsubgraphs to be replaced

If a constant parameter of some element (for example the capacitance ofa C element) is marked as replaceable in the kernel that creates an open-ing for a new node conforming to the Expression type below the kernel inthe program The Expression types are all borrowed from the symbolicregression grammar presented in the previous section Importantly all ofthe variable nodes have been omitted so the expressions generated here arealways constant valued

If a bond is marked as replaceable in the kernel that creates an openingfor a new node conforming to the bond type below the kernel in the pro-gram Two nodes in table 454 conform to that type When the programis evaluated these nodes replace the given bond with two bonds joined bya 0- or a 1-junction Consequently they create openings for two new nodesconforming to the bond type one conforming to either the effort-in orthe effort-out type and an unlimited number of nodes conforming to thecomplementary type

The effort-in and effort-out types apply to nodes that will insertsome number of elements joined to the rest of the graph by one bond andhaving preferred causality such that the bond has causality effort-in or effort-out from the rest of the graph The nodes included in table 454 each insertjust one element plus the attaching bond These considerations ensure thatif sequential causality assignment (SCAP) applied to the kernel model wouldproduce an integral causality model then SCAP will produce an integralcausality model from the output of any genetic program in the grammar aswell If an integral causality model is desired (because for example only adifferential equation solver is available not a differential-algebraic solver andintegral causality models produce state-space models) then this is a great ef-ficiency advantage The grammar will not produce models incompatible with

118

the solver constraining the genetic algorithm to not waste time generatingmodels that will only be discarded

The child types column for the Kernel node is left blank since the list ofopenings below the kernel is specific to the kernel and which elements havebeen marked replaceable The child types column for the Constant node isleft blank because it is a terminal node and presents no openings below itselfin a program tree

Node Type Child typesKernel kernel

InsertCommonEffortJunction bond bond bond effort-in effort-outInsertCommonFlowJunction bond bond bond effort-in effort-out

AttachInertia effort-out Expression bondAttachCapacitor effort-in Expression bond

AttachResistorEffortIn effort-in Expression bondAttachResistorEffortOut effort-out Expression bond

Constant ExpressionAdd Expression Expression ExpressionMul Expression Expression ExpressionDiv Expression Expression Expression

Table 42 Grammar for genetic programming bond graphs

119

Chapter 5

Results and discussion

51 Bond graph modelling and simulation

Included below are two simple linear models using only standard linearbond graph elements and four non-linear models using modulated complexand compound elements The models are each named after a mechanicalanalog Simulations are performed using step and noise inputs or for theautonomous models starting from a non-equilibrium state

511 Simple spring-mass-damper system

Figure 51 shows a simple idealized linear spring-mass-damper system inschematic form The system consists of a mass m attached to a springc and a dashpot r in parallel An equivalent bond graph model for thissystem is shown in figure 52 The element parameters are

C = 196

I = 0002

R = 002

Equations 51 to 510 are the collected constitutive equations from each nodein the system The e f p and q variables measure effort flow momentumand displacement on the like numbered bonds The applied effort from thesource element is marked E

120

p3 = e3 (51)

q1 = f1 (52)

e1 = (q1196) (53)

e2 = (f2 lowast 002) (54)

e3 = (e4 + (minus10 lowast e1) + (minus10 lowast e2)) (55)

e4 = E (56)

f1 = f3 (57)

f2 = f3 (58)

f3 = (p30002) (59)

f4 = f3 (510)

Equations 511 and 512 are the differential state equations and equations513 to 520 are the readout equations for the remaining non-state variablesBoth are produced by the automated procedure described in Section 444There are two state variables p3 and q1 These are the energy variables ofthe two storage elements (C and I) Note that variable E does not appearon the left-hand side of any state or readout equation It is an input to thesystem

p3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (511)

q1 = (p30002) (512)

e1 = (q1196) (513)

e2 = ((p30002) lowast 002) (514)

e3 = (E + (minus10 lowast (q1196)) + (minus10 lowast ((p30002) lowast 002))) (515)

e4 = E (516)

f1 = (p30002) (517)

f2 = (p30002) (518)

f3 = (p30002) (519)

f4 = (p30002) (520)

121

The state variable values are found by numerical integration of the stateequations The value of E at any point in simulated time is interpolatedfrom a prepared data file as needed The remaining non-state and non-inputvariables are evaluated across simulated time by substituting state variablevalues into the right-hand side of readout equations as needed

Figure 53 shows the model responding to a unit step in the applied forceE The displacement and flow variables of the 1ndashC bond are plotted Sincethe junction is a common flow (ldquo1rdquo) junction all 4 bonds have the samedisplacement and flow value (equations 57 58 510) f1 and q1 shown infigure 53 are the speed and position of the spring the mass and the damperPredictably the speed rises immediately as the force is applied It crosseszero again at each of the position extrema The position changes smoothlywith inflection points at the speed extrema As the work done by the appliedforce is dissipated through the damper the system reaches a new steadystate with zero speed and a positive position offset where the spring and theapplied force reach an equilibrium

Figure 54 shows the model responding to unit magnitude uniform randomnoise in the applied force E This input signal and model response were usedfor the experiments described in Section 53 A broad spectrum signal suchas uniform random noise exercises more of the dynamics of the system whichis important for identification

122

Figure 51 Schematic diagram of a simple spring-mass-damper system

Cspring

Rdamper

Imass

1

1

2

3Seforce 4

Figure 52 Bond graph model of the system from figure 51

123

00 05 10 15 20t

-10

-5

0

5

10

15

20

f1q1

E

Figure 53 Response of the spring-mass-damper model from figure 52 to aunit step input

00 05 10 15 20t

-10

-5

0

5

10

f1q1

E

Figure 54 Response of the spring-mass-damper model from figure 52 to aunit magnitude uniform random noise input

124

512 Multiple spring-mass-damper system

Figure 55 shows in schematic form a series of point masses connected byidealized linear springndashdashpot shock absorbers An equivalent bond graphmodel for this system is given in figure 56 The element parameters are

C1 = C2 = C3 = 196

I1 = I2 = I3 = 0002

R1 = R2 = R3 = 002

There are 38 constitutive equations in the model When reduced to formthe differential state equations are as given in equations 521 to 521

p2 = (E + (minus10 lowast ((((p20002) + (minus10 lowast (p80002))) lowast 002)

+(q6196))))

p8 = (((((p20002) + (minus10 lowast (p80002))) lowast 002) + (q6196))

+(minus10 lowast ((((p80002) + (minus10 lowast (p140002))) lowast 002)

+(q12196))))

˙p14 = (((((p80002) + (minus10 lowast (p140002))) lowast 002) + (q12196))

+(minus10 lowast ((p140002) lowast 002)) + (minus10 lowast (q16196)))

q6 = ((p20002) + (minus10 lowast (p80002)))

˙q12 = ((p80002) + (minus10 lowast (p140002)))

˙q16 = (p140002)

Figure 57 shows the multiple spring-mass-damper system modelled infigure 56 responding to a unit step in the applied force E Variables q6q12 and q16 are plotted showing the displacement of the 3 springs fromtheir unsprung lengths When the step input is applied q6 rises first as it ismost directly coupled to the input followed by q12 then q16 The system isdamped but not critically and so oscillates before settling at a new steadystate Since the capacitances are identical all springs are displaced by anequal amount at steady state

Figure 58 shows the multiple spring-mass-damper system modelled infigure 56 responding to unit magnitude uniform random noise in the appliedforce E The spring (capacitor) displacements are shown again Here the

125

system acts as a sort of mechanical filter The input signal E is broad spec-trum but the output signal at subsequent stages (q6 q12 q16) is increasinglysmooth

Figure 55 Schematic diagram of a multiple spring-mass-damper system

SeE 11

Im1

Cc1

Rr1

Im2

Cc2

Rr2

Im3

Cc3

Rr32

0

3 1

8

09

1

14

16

15

16

5

1 12

11

7

4

13

10

Figure 56 Bond graph model of the system from figure 55

126

0 1 2 3 4 5t

00

05

10

15

20

25

30

35q6

q12

q16

E

Figure 57 Response of the multiple spring-mass-damper model from figure56 to a unit step in the applied force E

0 2 4 6 8 10t

00

05

10

15

20q6

q12

q16

E

Figure 58 Response of the multiple spring-mass-damper model from figure56 to unit magnitude uniform random noise in the applied force E

127

513 Elastic collision with a horizontal surface

Figure 59 shows an elastic ball released from some height above a rigidhorizontal surface Figure 510 is a bond graph model of the system usingthe CC element defined in section 243 The included R element representsviscous drag on the ball from the environment

Parametric details of the model are as follows The inertia I (mass ofthe ball) is 0002 the capacitance C (inverse spring constant of the ball) is025 the capacitance switching threshold r (radius of the ball) is 10 theresistance R (drag coefficient) is 0001 and the effort source is a constant-981 times the inertia (force of gravity on the ball)

Figure 511 shows the motion of centre of mass (capacitor displacement)of the bouncing ball modelled in figure 510 plotted against time The ballis released from a position above the surface falls and bounces repeatedlyEach bounce is shorter in time and displacement than the last as energy isdissipated through the sink element R The low point of each bounce is notas low as the last since the ball strikes the surface ever more gently and isnot compressed as much

Figure 59 An elastic ball bouncing off a rigid horizontal surface

128

Se (E)

1

1

I (I) R (R)

3 2

CC (C r)

4

Figure 510 Bond graph using a special switched C-element to model anelastic ball bouncing off a rigid horizontal surface

0 2 4 6 8 10t

0

2

4

6

8

10

q4

Figure 511 Response of the bouncing ball model from figure 510 releasedabove a rigid horizontal surface

129

514 Suspended planar pendulum

Figure 219 shows a planar pendulum in schematic form An equivalent bondgraph model is shown in figure 512

Note that the pin-joint is suspended by a parallel damper and springin the horizontal and vertical directions The model in figure 512 avoidsdifferential causality by not rigidly coupling the joined end to the groundThe joint subgraph of the bond graph model represents a pin joint with somehorizontal and vertical compliance The spring constant of this compliancecan be made arbitrarily high but the resulting differential equations used insimulation will be accordingly stiff

If not connected at the joint the pendulum-body would enjoy 3 degreesof freedom horizontal and vertical translation of the centre of mass plus ro-tation about the centre These are reflected in the pendulum body subgraphof the bond graph model by 3 inertia elements the flow variables associatedwith which represent the horizontal vertical and angular momentum of pen-dulum body An ideal pin joint perfectly rigid in the horizontal and verticaldirections would constrain the pendulum-body to have only one degree offreedom The horizontal and vertical position (and speed) of the pendulum-body would be entirely dependent on its angular position (and speed) Aliteral model is easily drawn up in bond graph notation by bonding constantzero flow sources to the common flow junctions representing the horizontaland vertical flow of the fixed end of the pendulum-body That model has3 storage elements (horizontal vertical and rotational inertias) but only 1degree of freedom so it will necessarily show derivative causality on 2 of theinertia elements Exactly which two are dependent and which one is indepen-dent will be determined by the node ordering used in causality assignment

Three common flow (ldquo1rdquo) junctions track the horizontal vertical androtational speed of the pendulum body Attached to these are activatedbonds leading to integrating signal blocks (ldquoINTrdquo) Integrating the horizon-tal vertical and rotational flow gives the corresponding position signals Therotational position signal is fed into four other signal blocks (not shown) eachof which performs a scaling and trigonometric function to produce the mod-ulating signal for one of the four modulated transformer (ldquoMTFrdquo) elementsThose unseen signal blocks perform the following operations where θ is theoutput of signal block ldquoINT1rdquo (that is rotational position the integral of

130

rotational flow signal a3)

mxA = 10 lowast cos(θ)

myA = 10 lowast sin(θ)

mxB = minus10 lowast cos(θ)

myB = minus10 lowast sin(θ)

There are 65 other constitutive equations in the model When reducedto state space form the differential state equations are

˙na1 = (p1100)

˙na2 = (p2100)

˙na3 = (p4110)

p1 = (((minus10 lowast (q21001)) + (minus10 lowast (((p1100)

+((100 lowast cos(na3)) lowast (p4110))) lowast 08))) + 00)

p2 = (minus981 + ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08))) + 00)

p4 = (((100 lowast cos(na3)) lowast ((minus10 lowast (q21001))

+(minus10 lowast (((p1100) + ((100 lowast cos(na3)) lowast (p4110))) lowast 08))))

+((100 lowast sin(na3)) lowast ((minus10 lowast (q24001)) + (minus10 lowast (((p2100)

+((100 lowast sin(na3)) lowast (p4110))) lowast 08)))) + ((minus100 lowast cos(na3)) lowast 00)

+((minus100 lowast sin(na3)) lowast 00) + (minus10 lowast ((p4110) lowast 05)))

˙q21 = ((p1100) + ((100 lowast cos(na3)) lowast (p4110)))

˙q24 = ((p2100) + ((100 lowast sin(na3)) lowast (p4110)))

Figure 513 shows the planar pendulum modelled in figure 512 respond-ing to an initial displacement The vertical displacement of the centre ofmass (int a2) and the extension of the vertical suspending spring (q24) areplotted The joint stiffness has been reduced to clearly illustrated the hor-izontal and vertical compliance in the pin joint The upper extremum ofthe centre position cycle decreases over time as energy is dissipated from thesystem and the pendulum swings through lesser angles The initial impactof the pendulumrsquos downward momentum being arrested by the spring causesa low frequency vibration in the spring that can be seen in the variable lowerextremum on the pendulum centre position and spring extension near the

131

start of the plot The more persistent cycle shows the spring extending dur-ing the lower part of the pendulum swing and contracting to near zero at thehigh points

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

10

0

7

MTFmyA

11

89

0

12

MTFmxB

18

SeFxB

13

0

14

MTFmyB

19

SeFyB

15

1617

1

20

C001

21

R08

22

1

23

C001

24

R08

25

Figure 512 Bond graph model of the system from figure 219

132

0 20 40 60 80 100t

-12

-10

-8

-6

-4

-2

0

int_a2

q24

Figure 513 Response of the pendulum model figure 512 released from aninitial horizontal position

515 Triple planar pendulum

Figure 517 shows a triple planar pendulum with damped-elastic joints inschematic form on the left and an equivalent bond graph model using com-pound elements on the right There are 104 constitutive equations in themodel When reduced to state space form there are 24 differential stateequations (equations 521 to 521)

Figures 514 through 516 show the response of the triple planar pendu-lum modelled in figure 517 after being released from an initial position withall 3 links laid out horizontally Figure 514 plots the vertical position of thecentres of mass of the 3 pendulum bodies Figure 515 plots the horizon-tal positions Figure 516 plots the angular displacement of each link fromhorizontal

133

˙V phib0 = ((00 + (((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) lowast 100 lowast sin(phib0)) + (minus10 lowast ((yj00001)

+((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))

0001)) lowast 100 lowast cos(phib0)) + (minus10 lowast 00) + (((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0))) + (minus10

lowast(V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))0001))

lowast100 lowast sin(phib0)) + (minus10 lowast ((yj10001) + (((V yCb0 + (100

lowastV phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001)) lowast 100 lowast cos(phib0)))500)˙V phib1 = ((00 + (((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) lowast 100 lowast sin(phib1)) + (minus10

lowast((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))

0001)) lowast 100 lowast cos(phib1)) + (minus10 lowast 00) + (((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1))) + (minus10

lowast(V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001)) lowast 100

lowastsin(phib1)) + (minus10 lowast ((yj20001) + (((V yCb1 + (100

lowastV phib1 lowast cos(phib1))) + (minus10 lowast (V yCb2 + (minus10 lowast 100

lowastV phib2 lowast cos(phib2)))))0001)) lowast 100 lowast cos(phib1)))500)˙V phib2 = ((00 + (((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1

lowastsin(phib1))) + (minus10 lowast (V xCb2 + (100 lowast V phib2

lowastsin(phib2)))))0001)) lowast 100 lowast sin(phib2)) + (minus10

lowast((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

lowast100 lowast cos(phib2)) + (minus10 lowast 00) + (00 lowast 100 lowast sin(phib2))

+(minus10 lowast 00 lowast 100 lowast cos(phib2)))500)

134

˙V xCb0 = ((((xj00001) + ((00 + (minus10 lowast (V xCb0 + (100 lowast V phib0

lowastsin(phib0)))))0001)) + (minus10 lowast ((xj10001)

+(((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))

0001))))100)˙V xCb1 = ((((xj10001) + (((V xCb0 + (minus10 lowast 100 lowast V phib0

lowastsin(phib0))) + (minus10 lowast (V xCb1 + (100 lowast V phib1

lowastsin(phib1)))))0001)) + (minus10 lowast ((xj20001)

+(((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))

0001))))100)˙V xCb2 = ((((xj20001) + (((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))0001))

+(minus10 lowast 00))100)˙V yCb0 = ((((yj00001) + ((00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0

lowastcos(phib0)))))0001)) + (minus10 lowast ((yj10001) + (((V yCb0

+(100 lowast V phib0 lowast cos(phib0))) + (minus10 lowast (V yCb1 + (minus10 lowast 100

lowastV phib1 lowast cos(phib1)))))0001))) + (minus10 lowast 100 lowast 981))100)˙V yCb1 = ((((yj10001) + (((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))0001))

+(minus10 lowast ((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001)))

+(minus10 lowast 100 lowast 981))100)˙V yCb2 = ((((yj20001) + (((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))0001))

+(minus10 lowast 00) + (minus10 lowast 100 lowast 981))100)

˙phib0 = V phib0

˙phib1 = V phib1

˙phib2 = V phib2

˙xCb0 = V xCb0˙xCb1 = V xCb1˙xCb2 = V xCb2 135

˙xj0 = (00 + (minus10 lowast (V xCb0 + (100 lowast V phib0 lowast sin(phib0)))))˙xj1 = ((V xCb0 + (minus10 lowast 100 lowast V phib0 lowast sin(phib0)))

+(minus10 lowast (V xCb1 + (100 lowast V phib1 lowast sin(phib1)))))˙xj2 = ((V xCb1 + (minus10 lowast 100 lowast V phib1 lowast sin(phib1)))

+(minus10 lowast (V xCb2 + (100 lowast V phib2 lowast sin(phib2)))))˙yCb0 = V yCb0˙yCb1 = V yCb1˙yCb2 = V yCb2˙yj0 = (00 + (minus10 lowast (V yCb0 + (minus10 lowast 100 lowast V phib0 lowast cos(phib0)))))˙yj1 = ((V yCb0 + (100 lowast V phib0 lowast cos(phib0)))

+(minus10 lowast (V yCb1 + (minus10 lowast 100 lowast V phib1 lowast cos(phib1)))))˙yj2 = ((V yCb1 + (100 lowast V phib1 lowast cos(phib1)))

+(minus10 lowast (V yCb2 + (minus10 lowast 100 lowast V phib2 lowast cos(phib2)))))

0 5 10 15 20 25 30t

-50

-40

-30

-20

-10

0yC_b0

yC_b1

yC_b2

Figure 514 Response of the triple pendulum model figure 517 released froman initial horizontal arrangement ndash vertical displacement of pendulum bodycentres

136

0 5 10 15 20 25 30

t

-40

-20

0

20

40

xC_b0

xC_b1

xC_b2

Figure 515 Response of the triple pendulum model figure 517 releasedfrom an initial horizontal arrangement ndash horizontal displacement of pendulumbody centres

0 5 10 15 20 25 30t

-4

-3

-2

-1

0

phi_b0

phi_b1

phi_b2

Figure 516 Response of the triple pendulum model figure 517 released froman initial horizontal position ndash angular position of pendulum bodies

137

Link 1

Joint 1

Link 2

Joint 2

Link 3

Joint 3

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

ef (x)

ef (y)

Figure 517 Schematic diagram and compound element bond graph modelof a triple planar pendulum

138

516 Linked elastic collisions

Figure 518 shows a rigid bar with damped-elastic bumpers at each end inschematic form The system consists of a rigid link element L as used abovein 514 representing the bar two switched capacitor elements CCa CCbrepresenting elastic collision between the tips and a rigid horizontal surfaceunswitched vertical damping at each end and on rotation (not shown in theschematic) representing drag on the bar as it tumbles An equivalent bondgraph model for this system is shown in figure 519

Figure 520 shows the bar-with-bumpers modelled in figure 519 releasedabove a rigid horizontal surface inta2 is the vertical displacement of thecentre of mass above the horizontal surface inta3 is the angular displacementof the bar from vertical q21 and q24 are the vertical displacements of thetwo end points above the surface The bar is released level to the horizontalsurface so inta2 q21 and q24 (height of the centre left tip and right tip)are coincident and inta3 (angular displacement) is steady at π2 before thebar tips strike the ground at height 1 equal to the unsprung length q ofthe contact compliances The contact compliance on one side has twice thecapacitance (half the stiffness) of the other so it compresses further on impactthen bounces higher as the bar rotates in the air

Figure 518 Schematic diagram of a rigid bar with contact compliance andviscous drag at each end

139

Im

1

1

TINT1

a1

Im

1

2

TINT2

a2

Semg

3

1

IJ

4

R05

5

INT1

a3

0

6

MTFmxA

11

SeFxA=0

7

0

8

MTFmyA

12

910

0

13

MTFmxB

18

SeFxB=0

14

0

15

MTFmyB

19

1617

1

20

CC001

21

R08

22

1

23

CC0005

24

R08

25

Figure 519 Bond graph model of the system from figure 518

0 2 4 6 8 10t

0

1

2

3

4

5int_a2

int_a3

q21

q24

Figure 520 Response of the bar-with-bumpers model from figure 519 re-leased above a rigid horizontal surface

140

52 Symbolic regression of a static function

As an early proof of the genetic programming software a static functionapproximation problem was solved by symbolic regression and some obser-vations of the system performance were made The genetic programmingsystem at that time was mono-typed with a special division operator to pro-tect against divide-by-zero errors Four aspects of the genetic programmingsystem performance were measured

1 Population fitness (summarized as minimum mean and maximum pro-gram fitness each generation)

2 Population diversity (the number of structurally unique programs ineach generation)

3 Program age (the number of generations that each program had beenpresent in the population)

4 Program size (summarized as minimum mean and maximum numberof nodes in a program)

The particular symbolic regression problem chosen consisted of 100 samplesof the function x2 + 2 at integer values of x between minus50 and +50 Thefitness evaluation consisted of finding the total absolute error between thecandidate and the target samples and normalizing that value onto the interval(0 1] where a perfect candidate with zero error was awarded a fitness of 1Exceptionally where the candidate was found to perform division by zero orto return the special value NaN (ldquonot a numberrdquo) a fitness value of zero wasawarded Over 50 genetic programming runs were performed all with thenominal crossover probability set at 08 the mutation probability set at 02and the number of generations limited to 500 but with a variety of populationsizes

521 Population dynamics

Plots of the four measured aspects of genetic programming run performanceare shown for two particularly interesting runs (run ldquoArdquo and run ldquoBrdquo) infigures 521 to 528 For both runs the population size was fixed at 500individuals

141

The fitness plot for run ldquoArdquo shows a very typical slow initial progressand low initial fitness Given that the number of programs that could berandomly generated even within the size limits placed on members of theinitial random population it would be very rare indeed for a sizable numberor even one individual to have a fitness value that is not very low The initialbest mean and worst fitness values in both runs are all extremely low Theother characteristic of run ldquoArdquo fitness that is very typical is the long initialstretch of slow progress Runs ldquoArdquo and ldquoBrdquo similarly take tens of generationsfor the best of generation fitness to exceed 01

The mean fitness trend in run ldquoArdquo can be seen on close inspection to makeits first break from the ldquofloorrdquo near generation 20 At this same generationthe beginning of a sharp decline in population diversity can be seen Fromthis one can conclude that around generation 20 a new individual appearedwith significantly greater than average fitness and quickly came to dominatethe population In run ldquoArdquo the mean program size begins to decrease atgeneration 20 and reaches a minimum around generation 30 corresponding tothe population diversity minimum for this run From this one can infer thatthe newly dominant individual was of less than average size when introducedBy generation 30 there were less than 40 unique program structures in apopulation of 500 As there came to be more and more individuals of thedominant (and smaller than average ) type the mean program size necessarilydecreased as well When the population diversity later rose many of thesenewly introduced program structures must have been larger than the recentlydominant one since the mean program size also increased

Figure 524 (ldquoProgram age in run Ardquo) has been marked to indicate thegeneration in which the oldest ever individual for that run was created neargeneration 25 The oldest (maximum age) most fit (maximum fitness) andlargest (maximum size) individuals are of course not necessarily or even likelythe same program

Run ldquoBrdquo experiences an even more severe loss of diversity falling to just20 unique program structures in its 100th generation The sharp decrease inpopulation diversity between generations 50 and 100 corresponds exactly tothe sharp increase in mean program fitness over the same interval At theend of that time the mean program fitness has risen to almost match thatof the most fit program where it stays for over 300 generations The suddenrise in mean fitness and decline in diversity is a dramatic example of oneextremely successful design overwhelming all others The entirely flat fitnesstrend for the ensuing 300 generations is an equally dramatic example of the

142

adverse effects of low population diversity on genetic programming searchprogress In effect the inherent parallelism of the method has been reducedAlthough this is not the same as reducing the population size since multipleconcurrent crossovers with identical parents are still likely to produce variedoffspring due to the random choice of crossover points it has a similar effectDuring this 300 generation ldquoslumprdquo the mean program age rises steadily withan almost perfectly linear slope of just less than unity indicating that thereis some but very little population turnover The algorithm is exploiting oneprogram structure and small variations in the neighbourhood that differsonly by leaf node values There is very little exploration of new programstructures happening

It was often observed that crises in one of the measured aspects of geneticprogramming were accompanied by sudden changes in other aspects Ingeneral not much can be inferred about one aspect by observing the otherswith the possible exceptions of inferring the arrival of an individual withdramatically higher than average fitness from sudden decreases in populationdiversity or changes in mean program size In both cases it is the dominanceof the new program structure in the population that is being observed andthis inference would not be possible if the genetic programming system tookactive measures to ensure population diversity

Note also that although the peak program size is sometimes recordedin excess of the declared ceiling of 100 nodes this is simple an artifact ofthe order in which fitness evaluation and statistics logging were performedeach generation Individuals of excessive size were recorded at the pointof fitness evaluation were barred from participating in crossover mutationor reproduction due to their zero fitness score and so could not enter thepopulation of the following generation

522 Algebraic reductions

The non-parsimonious tendencies of typical genetic programming output iswell known and often commented on A successful result however is al-ways functionally equivalent in some testable ways to the target or goalSpecifically it excels at the particular fitness test in use Sometimes thereare structural equivalences that can be exploited to produce a simplified orreduced representation which is nonetheless entirely equivalent to the orig-inal in a functional sense In symbolic regression the exploitable structuralequivalences are simply the rules of algebra Even a very small set of alge-

143

Figure 521 Fitness in run A ndash progress slow initially and while size limitingin effect

Figure 522 Diversity in run A ndash early crisis is reflected in size and age trends

144

Figure 523 Program size in run A ndash mean size approaches ceiling (100nodes)

Figure 524 Program age in run A ndash bold line extrapolates birth of oldestprogram

145

Figure 525 Fitness of run B ndash progress stalled by population convergence

Figure 526 Diversity of run B ndash early loss of diversity causing stalledprogress

146

Figure 527 Program size in run B ndash early domination by smaller thanaverage schema

Figure 528 Program age in run B ndash low population turnover mid-run be-tween crisis

147

braic reduction rules capable of producing reduced results which are in manycases vastly more legible The algebraic reduction rules implemented at thetime these experiments were performed are listed below in s-expression form(LISP prefix notation)

(- x x) replaced by (0)

(- x 0) replaced by (x)

(+ x 0) replaced by (x)

(+ 0 x) replaced by (x)

( x x) replaced by (1)

( 0 x) replaced by (0) for all x

All of these reductions operate on a single function node and its childrenOnly two properties of the children are checked first is either child a constantvalue node and equal to zero or one and second are the children equal toeach other

One other type of simplification was performed as well Any subtree con-taining only function nodes and constant value terminal nodes was replacedby a single constant value node The value of this new node is found by eval-uating the subtree at an arbitrary value of the independent variable Thisldquotrickrdquo alone is responsible for many reductions that are more dramatic thanthose produced by all the other rules listed above

It is not at all clear by inspecting the raw expression in figure 529 a bestapproximation that successfully terminated one run after 41 generationsthat it approximates x2 + 2 The raw expression is simply too large andcontains misleading operations (trigonometry and division) However thereduced expression in figure 530 is very obviously a good approximation tothe target expression All of these gains in clarity are had by recognizingthat the entire right branch of the root Add node is a constant expressioncontaining no variables It evaluates to very nearly 2

148

Figure 529 A best approximation of x2 + 20 after 41 generations

149

Figure 530 The reduced expression

150

53 Exploring genetic neighbourhoods by per-

turbation of an ideal

The symbolic regression for dynamic systems genetic programming grammarpresented in section 453 is here evaluated by sampling the neighbourhoodof a manually constructed ldquoidealrdquo program The ideal is constructed so asto have exactly the same differential state update equations and one of theoutput equations of the model described in Section 511 The neighbourhoodof this ideal is sampled by repeatedly applying the mutation operator to freshcopies of the ideal This generates a large number of new genetic programsthat are ldquonearrdquo to the ideal (they differ by only the subtree rooted at themutation point) All of the generated programs are fitness tested against theresponse of the ideal to a uniform random noise input as shown in figure 54

To quantify nearness a metric for pairwise distance between tree struc-tures is defined In this metric distance is equal to the size (number of nodes)of the subtree rooted at the first node that does not match in a breadth-firsttraversal of both trees Since the corresponding subtrees in the original andthe mutant differ the size of the largest subtree is used This metric is sym-metric (the distance between two trees is independent of the order they aregiven in) Under this metric identical trees have a distance of 0 trees thatdiffer by only one leaf node have a distance of 1 and so on This is a formof ldquoedit distancerdquo It is the number of nodes that would have to be addeddeleted or replaced to transform one program into the other

Figure 532 shows the ldquohand craftedrdquo ideal symbolic regression tree Table53 shows the mean fitness of programs by distance and the total number ofprograms found at that distance within the sample Figure 531 plots meanfitness versus tree-distance from the ideal Distances at which fewer than 10programs were found are omitted

The point of these results is that the representation in question producesa fitness landscape that rises on approaching the ideal If edit distance asdefined above is taken as an estimate of the amount of work the geneticalgorithm will have to do to reach the ideal starting from the mutant thenthese results suggest the given representation will be well-behaved in geneticprogramming experiments If instead the fitness landscape had fallen offapproaching the ideal the representation would be called ldquodeceptiverdquo Theheuristic by which genetic algorithms operate (ldquogoodrdquo solutions have goodparts and ldquogoodnessrdquo is measured by the fitness test) would be misled in

151

Distance Mean fitness Count10 0000469 2212030 0000282 503050 0000346 66070 0000363 36090 0000231 250110 0000164 160130 0000143 100150 0000071 100170 0000102 90190 0000221 10230 00 20250 00 20270 00 10290 0000286 50350 00 10510 0000715 10

Table 51 A sampling of the neighbourhood of a ldquohand craftedrdquo dynamicsystem symbolic regression tree that outputs equations 51 52 and 57

152

that case Another pathologically difficult problem for genetic programmingwould be one in which the fitness landscape is entirely flat and low at allpoints not exactly coincident with the ideal In such cases fitness-guidedsearch such as genetic programming can do no better than a random search

0 2 4 6 8 10 12 14 16 18

Distance

0

1

2

3

4

5x1e-4

Figure 531 Mean fitness versus tree-distance from the ideal Bars showstandard error Series truncated at first point with only one sample

54 Discussion

541 On bond graphs

Bond graphs are a good choice of modelling language for several reasons Inshort bond graphs encode structure in physically meaningful and computa-tionally useful ways Bond graphs are a modular modelling language in whicha subsystem once modelled can be reused elsewhere Every element in thebasic bond graph modelling language models a single physical phenomenaThese can be interconnected freely but the language provides qualitativeinsight into the physical consequences of some modelling decisions Theseinsights come from the available model reductions (a sort of algebra for bondgraphs) and from the augmented bond graph in which a causal direction

153

DynamicSystem

StateEquationSet OutputEquation

StateEquation StateEquationSet

StateVariableLHS Div

StateVariableRHS Constant

StateEquation

StateVariableLHS Add

Add Mul

InputVariable Mul

Constant Div

StateVariableRHS Constant

Constant Mul

Div Constant

StateVariableRHS Constant

OutputVariable Div

StateVariableRHS Constant

Figure 532 The hand crafted ldquoidealrdquo symbolic regression tree

154

has been assigned to every bond These qualitative insights may be helpfulduring an interactive or iterative grey-box system identification exercise

Model reductions generate alternate behaviourally equivalent but struc-turally simplified models This may highlight areas where multiple parts ina model have been used to represent one physical part in the system Modelreductions are also computationally valuable when using bond graphs withgenetic programming since the reduced model will have fewer equations Thealgebraic manipulations required between generating the bond graph modeland solving the equations numerically are more complex than the bond graphreductions so it is best to simplify as early as possible A smaller set of equa-tions will also be quicker to evaluate numerically (provided it is no more stiffthan the original system) simply because the right-hand side of the stateequations will be shorter and faster to evaluate at each point along the inte-gration

Causality assignment produces an augmented bond graph from which itis simple to tell whether a state-space formulation of the model equationswill be possible If not then the model is over specified and has fewer statevariables than discrete energy storing elements Some storage elements aredependent on others and the reason (inertias with common flow capacitorswith common effort etc) can often be linked to physically intuitive notionsabout the system under test and the way it is abstracted into the model

As a corollary to the fact that bond graph models encode structure inphysically meaningful ways bond graphs present a smaller search space forgenetic algorithms (or any other search technique) to explore than do someother representations There is always a physical interpretation for any bondgraph in a choice of domain (electrical solid body mechanical hydraulic)The interpretation may not be realizable in size or scale but it exists Thereare signal flow graphs and sets of differential equations to give two examplesfor which no physical interpretation exists in a given domain A signal-flowgraph and a set of differential equations can each be extracted from any bondgraph model (the former simply by splitting each bond into an effort signaland a flow signal) but the reverse is not true In both transformations someinformation about the physical structure is lost

Finally bond graphs are an attractive modelling language because theyscale smoothly from simple linear systems where additional automated trans-formations are possible and for which a great body of analysis exists tocomplex nonlinear and even non-analytic (numerical or computational only)models without losing any of the above advantages A model can start out

155

small simple and linear but grow large complex and non-linear over thecourse of some experiments without ever having to stop and recode the modelin a new language

Bond graphs have distinct disadvantages however all of which comedown to being the wrong sort of abstraction for some applications As men-tioned in the introduction to this thesis there are no perfect models Allmodels are abstractions that trade off fidelity for simplicity The art of mod-elling is to decide or discover which aspects of the system are important tothe exercise and so must be included in a model and which aspects are notimportant and can be dropped for simplicity A modelling language thatpresents abstractions at just the right level of detail can help this task alongBond graphs present abstractions (elements and bond) with more physicaldetail than a signal-flow graph or a set of differential equations but withless detail than for example a free body diagram for solid-body mechanicsapplications

A modelling language which is too abstract will be inconveniently dis-tracting both to human experimenters and to search algorithms such as ge-netic programming Planar and 3-D solid body mechanics applications areone area where bond graphs can be faulted in this regard

Fundamental to the bond graph modelling paradigm is the requirementfor power bond variables to be given in compatible units For mechanicsapplications they must also either be given with respect to one global frameof reference or complicated subgraphs must be introduced for coordinatetransformation purposes Although libraries of such joint transformationshave been published the result of this technique is that parts of a modeldo not represent power transfer within the system but rather happen toproduce the correct equations to transform coordinates between adjacentparts of the model These inconvenient work-arounds for limitations of themodelling language can be distracting All models presented in this thesis usea single global frame of reference which precludes for example modellingbody-aligned compliance in the triple-pendulum pin joints

The highly abstract representations presented by the bond graph mod-elling language can also be ldquodistractingrdquo to a search algorithm such as geneticprogramming and lower its efficiency by allowing many models that are validwithin the language but physically senseless with respect to a particular ap-plication Consider the rigid link with connection points at either end usedin the single and triple pendulum models and in the linked elastic collisionsmodel If this subgraph were presented to the genetic programming system

156

as part of a larger model there is every chance that the algorithm would atsome point generate models containing internal modifications that destroyedthe symmetry of that subgraph in a way physically senseless in terms of rigidbodies The rate at which such valid-but-unhelpful models are generated willbe in direct proportion to the ratio of the size of the rigid body sub-modelto the whole model within the genetic program

Encapsulating the rigid-link sub-model as was done for the triple pen-dulum model alleviates these problems at the cost of effectively inventing anew and very specialized modelling language In order to be reusable therigid link sub-model includes three state variables This leads directly to theneed for ldquosuspendedrdquo pin joints in the pendulum models to avoid differentialcausality and the need for a differential-algebraic solver A modelling lan-guage wholly dedicated to rigid body mechanical applications would have thecontext on hand to be able to insert a single degree of freedom model whereappropriate It would also be much less abstract and therefore also much lesswidely applicable (for example to electrical or hydraulic applications) thanare bond graphs

542 On genetic programming

Genetic programming has several desirable properties as a search algorithmto be used for system identification Overall genetic programming requiresvery little problem-specific information in order to begin but it also providestwo mechanisms by which any additional information about the problem canbe taken advantage of The basic requirements for the application of geneticprogramming to system identification are a generative grammar for modelsand a fitness test Additional problem-specific insight can be provided eitherwithin the fitness test (to reject undesirable models) or in the form of amore restrictive grammar (to reduce the size of the search space which ispreferable) Some modelling languages make it easier to specify restrictivegenetic grammars than do others because of the way each encodes physicalstructure

The use of indirection through a genetic grammar means that the exactstructure of need not be pre-specified This allows genetic programming toperform simultaneous structure and parameter identification It also allowsthe structure to grow or shrink in the course of identification Many othermodelling and identification schemes require that the model size be specifiedahead of time (for example the number of layers and nodes per layer must

157

be chosen for a neural network)In contrast to some other bio-mimetic search algorithms such as swarm

and ant colony algorithms and the venerable simplex algorithm geneticprogramming solutions need not exist within a metric space Some concept ofdistance between genetic programs may be useful in evaluating a particulargenetic representation or genetic programming implementation but one isnot required

A grey-box system identification is easily implemented by seeding the ini-tial genetic population with instances of a model encoding any prior knowl-edge or by providing this model as input to each genetic program when it isevaluated by the fitness function In fact the algorithm can be interruptedand subjected to this kind of external ldquohelprdquo at any point Genetic pro-gramming is extensible in many ways It is easily hybridized with othersearch techniques such as parametric hill climbing or simulated annealingby introducing new operators that are applied either occasionally or beforeevery fitness evaluation and by varying the choice of operators over timeThe latter choice (applying some new optimizing operator before each fitnessevaluation) is a sort of search within a search algorithm A simulated an-nealing with genetic programming hybrid is made by varying the likelihoodof choosing each genetic operator over the course of a run according to someschedule

Genetic programming is easily parallelizable for distribution across a lo-cal or wide area network in one of several ways One particular scheme(island model) has been attributed in the literature with performance gainsthat are super-linear in the degree of parallelism It is fortunate that theseparallelization options are available since genetic algorithms are notoriouslycomputationally expensive

Computational expense is chief among the detractions of genetic program-ing but not the only one Because they use so little problem-specific knowl-edge genetic algorithms may be needlessly inefficient when such knowledgeis available unless it can be used to reduce the search space by refining thestructural grammar This has in effect been stated already in this sectionbut it is worth repeating while underlining that it may not always be a simplematter to encode knowledge of the system under test in a suitably restrictivegenetic grammar Encoding this knowledge in the fitness test is easier (sim-ply assign arbitrarily low fitness to models that do not meet expectations)but much less efficient since these rejected genetic programs are producedstored in the population and evaluated at some expense in processing time

158

and memoryFinally some problems are deceptively hard and genetic algorithms offer

no help in identifying them Genetic algorithms can be envisioned as ex-ploring a ldquofitness landscaperdquo and seeking a global maximum thereon Thefitness landscape of deceptive problems provide no or negative feedbackabout proximity to the global optimum They may also use a genetic repre-sentation in which the crossover and mutation operators are often destructivecutting through and breaking apart useful sub-models The argument thatbond graphs are too abstract for some rigid body mechanics applicationsis an example Since genetic algorithms including tree-structured geneticprogramming offer no special feedback or guidance when a hard problemis encountered and since empirical studies of genetic programming are soexpensive in computer time it can be difficult to predict in which situationsgenetic programming will bear fruit

159

Chapter 6

Summary and conclusions

Black-box identification of general non-linear dynamic systems with any num-ber of inputs and outputs is a hard problem It is the most general andencompassing class of problems in system identification The project behindthis thesis set out over-ambitiously without question to create a softwaretool that would address this most general class of identification problems ina very flexible way The aim was to create a system that would allow an ex-perimenter to insert as much or as little information as was available aboutthe system under test before or during an identification to allow this infor-mation to be specified in a way that is natural to the task and to have thealgorithm make good use of it Additionally it was desired that the resultingmodel should be in a form that is ldquonaturalrdquo to the experimenter Questionsof efficient identification were deliberately set aside even from the beginningwith one exception all structural identification problems are at heart searchproblems within a space defined by the modelling language These spacescan be very large infinite if the model size is not bounded so the most im-portant step that can be taken towards an efficient structural identificationalgorithm is to provide an easy mechanism by which an experimenter canlimit the solution space by adding any information they may have about thesystem

Bond graphs are a flexible and ldquonaturalrdquo modelling language for manyapplications and genetic programming is a general purpose heuristic searchalgorithm that is flexible in useful ways The work leading up to this thesisproduced software tools for modelling with bond graphs a strongly typedgenetic programming implementation and genetic programming grammarsthat define search spaces for linear and non-linear bond graph models and

160

symbolic regression of differential equations in state-space form The bondgraph modelling implementation is demonstrated with several simulations ofmulti-body mechanical systems The genetic programming implementationis checked by applying it to symbolic regression of a static function Theneighbourhood of a manually constructed ldquoidealrdquo genetic program modellinga particular dynamic system is examined to show that the fitness landscapeis shaped in a way that will drive the genetic algorithm towards the givenideal

61 Extensions and future work

611 Additional experiments

It should be very clear that many of the original goals of this project (de-sign of a flexible software system for black- and grey-box identification ofnon-linear dynamic systems) were not met It is expected for example thatbond graphs are a much more efficient representation for use with geneticprogramming in identifying dynamic system models than is direct symbolicregression of differential equations This remains untested A suitable exper-iment would involve many genetic programming runs with each representa-tion on one or more identification problems Population size times averagenumber of generations until a solution is found would be a suitable metricfor comparison This metric is the average number of fitness evaluationsperformed in the course of the search Since fitness evaluation (involvingsimulation and comparison of simulation results to stored signals) is by farthe most computationally intensive aspect of the search this is a good andeasily calculated measure of the total amount of computer time required

It would be interesting to repeat the genetic neighbourhood explorationswith the genetic programming grammar for bond graphs For comparisonthe ideal should be designed such that its constitutive equations when ex-tracted and reduced to state space form are identical to the equation setproduced by the symbolic regression ideal

There should be no great difference in run time for the fitness evaluationof bond graph models and symbolic regression programs that address thesame identification problem If some method were available to automaticallycreate a genetic program of each type that produced identical behaviour (oneis suggested in the next section) then this could also be tested with a much

161

larger (and better randomized) sample size than could reasonably be createdby hand

Parametric identification with bond graphs by symbolic regression of con-stant valued expressions for element parameters is expected to be faster thanblack box identification of the entire structure and parameter set It is notexpected to be less computationally intensive than conventional direct meth-ods (such as linear least squares) when they are available Also the perfor-mance of parametric identification by symbolic regression is independent ofthe representation chosen for the unchanging structure be it bond graph(unchanging) symbolic regression tree or anything else

Starting from the accomplished static function approximation results agood course would be to attempt

1 Parametric identification of a linear dynamic system by symbolic re-gression

2 Structural and parametric identification of a linear dynamic systemusing bond graphs

3 Parametric identification of a non-linear dynamic system using bondgraphs

4 Structural and parametric identification of a non-linear dynamic systemusing bond graphs

At each stage the simplest imaginable system should be attempted first aspring-mass system before a spring-mass-damper system before a multi-bodyspring-mass-damper system For a simple non-linear system the ldquobouncingballrdquo example would be interesting or a non-linear (eg progressive stiffen-ing) spring could be modelled In the first case the two parameters of theCC element need to be discovered by the algorithm In the later case thespring curve that is the effort-displacement relation for the non-linear C el-ement would need to be discovered (by symbolic regression even though theremainder of the model is a bond graph)

If they were to find any more use as a modelling tool in their own rightoutside of the system identification tasks that were the goal of this projectthen the bond graph library would benefit from verification against one ormore established modelling tools either a bond graph modelling applicationsuch as 20sim or domain specific tools such as Spice and Adams

162

612 Software improvements

To make the tools more widely interoperable the ldquobgrdquo file format for bondgraph model storage should be replaced with a file format known to and inuse by other software GraphML an XML schema for exchanging graphs ofall types may be a good choice

To facilitate grey-box identification experiments a routine could be writ-ten which produces a minimal program from any phenotypic expression Forexample given an equation set return a symbolic regression program thatwould generate it or given a bond graph model return a program that wouldgenerate it from the empty kernel This would be a useful tool for boot-strapping experiments from existing models Presently such initial geneticprograms must be formulated manually

The software written in the course of producing this thesis contains in-complete implementations of schedulers and network protocols for both islandmodel distributed genetic programming and centralized genetic programmingwith distributed fitness evaluation If large problems are tackled one of these(or equivalent parallel processing capabilities) will be needed

Differential-algebraic equation solvers (DAE solvers) are available Thenumerical simulation code could be extended to use one when reduction tostate-space form fails on mixed-causality bond graph models

Several possible software optimizations were noted during developmentbut not implemented Fitness evaluations could be cached or memoizedsuch that duplicate genetic programs appearing in the population do notincur duplicate work by the fitness test (easily the most expensive part ofthe process in system identification applications) Memoization is a classictimespace computing trade-off Execution time is reduced at the cost ofstoring past results To limit the total cost not all cached results are keptor they are not kept indefinitely The implementation can be as simple as afinite table sorted in least-recently-used order and mapping genetic programs(or some short unique key derived from them) to cached results

Most tree methods in the software are implemented recursively Exam-ples of trees include algebraic expressions equations and equation sets andgenetic programs Methods on these trees include evaluating an algebraicexpression or a genetic program listing all the variables contained in an al-gebraic expression and so on These methods are implemented such that theroot node invokes the like-named method on the nodes below it aggregatingtheir results in some way as they do to the nodes below them The stan-

163

dard Python interpreter (called cPython when it is necessary to distinguishit from alternates) is implemented in C and uses the underlying stack to storecontext frames each time a new method is invoked While the software isprocessing very large tree structures the Python interpreter will use quitea bit of memory to the detriment of other processes This is exacerbatedby the Python interpreter being not especially aggressive about returningpreviously allocated but currently unused memory to the operating systemIt is always possible to rewrite a recursive algorithm in an iterative formthough the expression may not be as clear or concise This could be done inmany areas of the software if memory use becomes an issue during larger ex-periments If the trees become so large as to overflow the stack the softwarecould also be run on top of Stackless Python [77] a python interpreter thatdoes not use the underlying C stack to store context frames

164

Bibliography

[1] Netlib repository at UTK and ORNL httpwwwnetliborg

[2] Dynasim AB Dymola dynamic modeling laboratory httpwww

dynasimsedymolahtm 2006

[3] Paul-Michael Agapow Information criteria httpwwwagapownetsciencemathsinfo criteriahtml 2003

[4] R R Allen Multiport models for the kinematic and dynamic analysisof gear power transmiffions Transactions of the ASME 101258ndash267April 1979

[5] B Bamieh and L Giarre Identification of linear parameter varying mod-els International Journal of Robust and Nonlinear Control 12(9)841ndash853 2002

[6] JS Bendat Nonlinear Systems Techniques and Applications WileyInterscience New York NY 2002

[7] P Bendotti and B Bodenheimer Linear parameter-varying versus lineartime-invariant control design for a pressurized water reactor Interna-tional Journal of Robust and Nonlinear Control 9(13)969ndash995 1999

[8] S H Birkett and P H Roe The mathematical foundations of bondgraphs-i algebraic theory Journal of the Franklin Institute (326)329ndash250 1989

[9] S H Birkett and P H Roe The mathematical foundations of bondgraphs-ii duality Journal of the Franklin Institute (326)691ndash708 1989

165

[10] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iii matroid theory Journal of the Franklin Institute (327)87ndash108 1990

[11] S H Birkett and P H Roe The mathematical foundations of bondgraphs-iv matrix representation and causality Journal of the FranklinInstitute (327)109ndash128 1990

[12] B Bodenheimer P Bendotti and M Kantner Linear parameter-varying control of a ducted fan engine International Journal of Robustand Nonlinear Control 6(9-10)1023ndash1044 1996

[13] S Boyd and L O Chua Fading memory and the problem of approxi-mating nonlinear operators with volterra series IEEE Transactions onCircuits and Systems pages 1150ndash1171 1985

[14] Elizabeth Bradley Matthew Easley and Reinhard Stolle Reasoningabout nonlinear system identification Artificial Intelligence 133(1-2)139ndash188 2001

[15] PJ Costa Branco and JA Dente A fuzzy relational identificationalgorithm and its application to predict the behaviour of a motor drivesystem Fuzzy Sets and Systems pages 343ndash354 2000

[16] FE Cellier M Otter and H Elmqvist Bond graph modeling of vari-able structure systems Proc ICBGMrsquo95 2nd SCS Intl Conf on BondGraph Modeling and Simulation pages 49ndash55 1995

[17] Ali Cinar Nonlinear time series models for multivariable dynamic pro-cesses InCINCrsquo94 ndash The first International Chemometrics InterNetConference session on chemometrics in dynamic systems 1994

[18] Nichael Lynn Cramer A representation for the adaptive generation ofsimple sequential programs In Proceedings of the First InternationalConference on Genetic Algorithms and their Applications pages 183ndash187 Pittsburgh USA July 24-26 1985 Carnegie Mellon University

[19] Vjekoslav Damic and John Montgomery Mechatronics by bond graphsan object-oriented approach to modelling and simulation Springer-Verlag Berlin Heidelberg 2003

166

[20] B Danielson J Foster and D Frincke Gabsys Using genetic algo-rithms to breed a combustion engine IEEE International Conferenceon Evolutionary Computation 1998

[21] Darxus Springgraph httpwwwchaosreignscomcode

springgraph 2002

[22] K De Jong On using genetic algorithms to search program spacesIn Proceedings of the Second International Conference on Genetic Algo-rithms pages 210ndash216 Cambridge MA July 28-31 1987

[23] S Dzeroski and L Todorovski Discovering dynamics Proc TenthInternational Conference on Machine Learning pages 97ndash103 1993

[24] S Dzeroski and L Todorovski Discovering dynamics from inductivelogic programming to machine discovery Journal of Intelligent Infor-mation Systems 489ndash108 1993

[25] John Ellson et al Graphviz - graph visualization software http

wwwgraphvizorg

[26] L Fogel A Owens and M Walsh Artificial Intelligence Though Sim-ulated Evolution John Wiley and Sons New York USA 1966

[27] Richard Forsyth Beagle - a darwinian approach to pattern recognitionKybernetes 10159ndash166 1981

[28] Peter Gawthrop and Lorcan Smith Metamodelling Bond Graphs andDynamic Systems Prentice-Hall Inc 1996

[29] D E Goldberg Genetic Algorithms in Search Optimization and Ma-chine Learning Addison Wesley 1989

[30] E D Goodman K Seo Z Fan J Hu and R C Rosenberg Auto-mated design of mechatronic systems Novel search methods and mod-ular primitives to enable real world applications 2003 NSF DesignService and Manufacturing Grantees and Research Conference 1994

[31] Erik Goodman Galopps 324 - the ldquogenetic algorithm optimized forportability and parallelism systemrdquo httpgaragecsemsuedu

softwaregalopps August 25 2002

167

[32] Alan C Hindmarsh Odepack a systematized collection of ode solverspages 55ndash64 North-Holland Amsterdam August 1983 MIT Press

[33] S Hofmann T Treichl and D Schroder Identification and observationof mechatronic systems including multidimensional nonlinear dynamicfunctions 7th International Workshop on Advanced Motion Controlpages 285ndash290 2002

[34] John Holland Adaptation in Natural and Artificial Systems Universityof Michigan Press 1975

[35] K Hornik M Stinchcombe and H White Multilayer feedforward net-works are universal approximators Neural Networks 2 pages 359ndash3661989

[36] John Hunter Pylab (matplotlib) a python 2d plotting library http

matplotlibsourceforgenet 2003

[37] Eric Jones Travis Oliphant Pearu Peterson et al Scipy Open sourcescientific tools for python httpwwwscipyorg 2001

[38] A Juditsky H Hjalmarsson A Benveniste B Delyon L LjungJ Sjoberg and Q Zhang Nonlinear black-box modelling in systemidentification mathematical foundations Automatica 31 1995

[39] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem dynamics a unified approach John Wiley amp Sons second edition1990

[40] Dean C Karnopp Donald L Margolis and Ronald C Rosenberg Sys-tem Dynamics Modeling and Simulation of Mechatronic Systems JohnWiley amp Sons third edition 2000

[41] Maarten Keijzer Scientific Discovery Using Genetic Programming PhDthesis Danish Technical University Lyngby Denmark March 25 2002

[42] GJ Klir Architecture of Systems Problem Solving Plenum Press 1985

[43] John R Koza Genetic Programming on the Programming of Computersby Means of Natural Selection MIT Press Cambridge Massachusetts1992

168

[44] John R Koza Genetic Programming II Automatic Discovery ofReusable Programs MIT Press Cambridge Massachusetts 1994

[45] John R Koza Forrest H Bennett III David Andre and Martin AKeane Genetic Programming III Darwinian Invention and ProblemSolving Morgan Kauffman Publishers Inc 340 Pine Street SixthFloor 1999

[46] John R Koza Martin A Keane Matthew J Streeter William Myd-lowec Jessen Yu and Guido Lanza Genetic Programming IV RoutineHuman-competitive Machine Intelligence Kluwer Academic Publishers2003

[47] V Leite R Araujo and D Freitas A new online identification method-ology for flux and parameters estimation of vector controlled inductionmotors IEEE International Electric Machines and Drives Conference1449ndash455 2003

[48] Fei Chun Ma and Sok Han Tong Real time parameters identificationof ship dynamic using the extended kalman filter and the second orderfilter Proceedings of 2003 IEEE Conference on Control Applicationspages 1245ndash1250 2003

[49] Aahz Maruch Typing Strong vs weak static vs dynamic http

wwwartimacomweblogsviewpostjspthread=7590 July 2003

[50] John McCarthy Recursive functions of symbolic expressions and theircomputation by machine part i Communications of the ACM 1960

[51] Pieter J Mosterman Hybrid Dynamic Systems A hybrid bond graphmodeling paradigm and its application in diagnosis PhD thesis Van-derbilt University May 1997

[52] Michael Negnevitsky Artificial Intelligence Addison Wesley 2002

[53] M Nikolaou and D Mantha Efficient nonlinear modeling using waveletsand related compression techniques NSF Workshop on Nonlinear ModelPredictive Control 1998

[54] Michael OrsquoNeill and Conor Ryan Evolving multi-line compilable c pro-grams In Proceedings of the Second European Workshop on GeneticProgramming pages 83ndash92 London UK 1999 Springer-Verlag

169

[55] George F Oster and David M Auslander The memristor A new bondgraph element Transactions of the ASME Journal of Dynamic SystemsMeasurement and Control 94(3)249ndash252 1972

[56] Chris Paredis Composable simulation and design project overviewTechnical report CompSim Project Group Carnegie Mellon University2001

[57] Henry M Paynter Analysis and Design of Engineering Systems TheMIT Press Cambridge Massachusetts 1961

[58] Timothy Perkis Stack-based genetic programming In Proceedings ofthe IEEE World Congress on Computational Intelligence 1994

[59] Charles L Phillips and H Troy Nagle Digital Control System Analy-sis and Design Prentice-Hall Inc Englewood Cliffs New Jersey 3rdedition 1995

[60] R Poli and W B Langdon A new schema theory for genetic program-ming with one-point crossover and point mutation In InternationalConference on Genetic Programming GPrsquo97 pages 278ndash285 Stanford1997 Morgan Kaufmann

[61] Cristian Pop Amir Khajepour Jan P Huissoon and Aftab E PatlaBondgraph modeling and model evaluation of human locomotion usingexperimental data March 2000

[62] PSF Python copyright notice and license httpwwwpythonorg

docCopyrighthtml 2005

[63] Bill Punch and Douglas Zongker lil-gp genetic programming systemhttpgaragecsemsuedusoftwarelil-gp September 1998

[64] A Rogers and A Prugel-Bennett Theoretical Aspects of EvolutionaryComputing chapter Modelling the Dynamics of a Steady State GeneticAlgorithm pages 57ndash68 Springer 1999

[65] R C Rosenberg The gyrobondgraph A canonical form for multiportsystems models Proceedings ASME Winter Annual Conference 1976

170

[66] R C Rosenberg and D C Karnopp A definition of the bond graphlanguage Journal of Dynamic Systems Measurement and Controlpages 179ndash182 September 1972

[67] Ronald C Rosenberg and Dean Karnopp Introduction to Physical Sys-tem Dynamics McGraw-Hill Publishing Company New York NewYork 1983

[68] Conor Ryan J J Collins and Michael O Neill Grammatical evolutionEvolving programs for an arbitrary language In Wolfgang Banzhaf Ric-cardo Poli Marc Schoenauer and Terence C Fogarty editors Proceed-ings of the First European Workshop on Genetic Programming volume1391 pages 83ndash95 Paris France 1998 Springer-Verlag

[69] Kazumi Saito Pat Langley Trond Grenager and Christpher PotterComputational revision of quantitative scientific models Proceedingsof the 3rd International Conference on Discovery Science (DS2001)LNAI2226336ndash349 2001

[70] I Scott and B Mlllgrew Orthonormal function neural network fornonlinear system modeling IEEE International Conference on NeuralNetworks 41847ndash1852 1996

[71] K Seo Z Fan J Hu E D Goodman and R C Rosenberg Towardan automated design method for multi-domain dynamic systems usingbond graphs and genetic programming Mechatronics 13(8-9)851ndash8852003

[72] Pete Shinners et al Pygame Python game development http

pygameorg 2000

[73] J Sjoberg Q Zhang L Ljung A Benveniste B Delyon P-Y Gloren-nec H Hjalmarsson and A Juditsky Nonlinear black-box modelling insystem identification mathematical foundations Automatica 31 1995

[74] Reinhard Stolle and Elizabeth Bradley Opportunistic modeling Pro-ceedings IJCAI Workshop Engineering Problems for Qualitative Reason-ing 1997

171

[75] Reinhard Stolle and Elizabeth Bradley Multimodal reasoning for au-tomatic model construction Proceedings Fifteenth National Conferenceon Artificial Intelligence 1998

[76] Matthew J Streeter Martin A Keane and John R Koza Routineduplication of post-2000 patented inventions by means of genetic pro-gramming In James A Foster Evelyne Lutton Julian Miller ConorRyan and Andrea G B Tettamanzi editors Genetic ProgrammingProceedings of the 5th European Conference EuroGP 2002 volume 2278of LNCS pages 26ndash36 Kinsale Ireland 2002 Springer-Verlag

[77] Christian Tismer Stackless python a python implementation that doesnot use the c stack httpstacklesscom 1999

[78] L Todorovski Declarative bias in equation discovery Masterrsquos thesisFaculty of Computer and Information Science Ljubljana Slovenia 1998

[79] L Todorovski and S Dzeroski Declarative bias in equation discoveryProc Fourteenth International Conference on Machine Learning pages376ndash384 1997

[80] L Todorovski and S Dzeroski Theory revision in equation discoveryLecture Notes in Computer Science 2001

[81] T Treichl S Hofmann and D Schroder Identification of nonlinear dy-namic miso systems with orthonormal base function models Proceedingsof the 2002 IEEE International Symposium on Industrial Electronics 12002

[82] Guido van Rossum Computer programming for everybody A scoutingexpedition for the programmers of tomorrow CNRI Proposal 90120-1aCorporation for National Research Initiatives July 1999

[83] Guido van Rossum et al Python programming language http

pythonorg 1990

[84] Darrell Whitley A genetic algorithm tutorial Statistics and Computingpages 65ndash85 1994

[85] W J Wilson System Identification (EampCE 683 course notes) Univer-sity of Waterloo Waterloo Canada September 2004

172

[86] Ashraf A Zeid and Chih-Hung Chung Bond graph modeling of multi-body systems A library of three-dimensional joints Journal of theFranklin Institute (329)605ndash636 1992

173

  • Introduction
    • System identification
    • Black box and grey box problems
    • Static and dynamic models
    • Parametric and non-parametric models
    • Linear and nonlinear models
    • Optimization search machine learning
    • Some model types and identification techniques
      • Locally linear models
      • Nonparametric modelling
      • Volterra function series models
      • Two special models in terms of Volterra series
      • Least squares identification of Volterra series models
      • Nonparametric modelling as function approximation
      • Neural networks
      • Parametric modelling
      • Differential and difference equations
      • Information criteria--heuristics for choosing model order
      • Fuzzy relational models
      • Graph-based models
        • Contributions of this thesis
          • Bond graphs
            • Energy based lumped parameter models
            • Standard bond graph elements
              • Bonds
              • Storage elements
              • Source elements
              • Sink elements
              • Junctions
                • Augmented bond graphs
                  • Integral and derivative causality
                  • Sequential causality assignment procedure
                    • Additional bond graph elements
                      • Activated bonds and signal blocks
                      • Modulated elements
                      • Complex elements
                      • Compound elements
                        • Simulation
                          • State space models
                          • Mixed causality models
                              • Genetic programming
                                • Models programs and machine learning
                                • History of genetic programming
                                • Genetic operators
                                  • The crossover operator
                                  • The mutation operator
                                  • The replication operator
                                  • Fitness proportional selection
                                    • Generational genetic algorithms
                                    • Building blocks and schemata
                                    • Tree structured programs
                                      • The crossover operator for tree structures
                                      • The mutation operator for tree structures
                                      • Other operators on tree structures
                                        • Symbolic regression
                                        • Closure or strong typing
                                        • Genetic programming as indirect search
                                        • Search tuning
                                          • Controlling program size
                                          • Maintaining population diversity
                                              • Implementation
                                                • Program structure
                                                  • Formats and representations
                                                    • External tools
                                                      • The Python programming language
                                                      • The SciPy numerical libraries
                                                      • Graph layout and visualization with Graphviz
                                                        • A bond graph modelling library
                                                          • Basic data structures
                                                          • Adding elements and bonds traversing a graph
                                                          • Assigning causality
                                                          • Extracting equations
                                                          • Model reductions
                                                            • An object-oriented symbolic algebra library
                                                              • Basic data structures
                                                              • Basic reductions and manipulations
                                                              • Algebraic loop detection and resolution
                                                              • Reducing equations to state-space form
                                                              • Simulation
                                                                • A typed genetic programming system
                                                                  • Strongly typed mutation
                                                                  • Strongly typed crossover
                                                                  • A grammar for symbolic regression on dynamic systems
                                                                  • A grammar for genetic programming bond graphs
                                                                      • Results and discussion
                                                                        • Bond graph modelling and simulation
                                                                          • Simple spring-mass-damper system
                                                                          • Multiple spring-mass-damper system
                                                                          • Elastic collision with a horizontal surface
                                                                          • Suspended planar pendulum
                                                                          • Triple planar pendulum
                                                                          • Linked elastic collisions
                                                                            • Symbolic regression of a static function
                                                                              • Population dynamics
                                                                              • Algebraic reductions
                                                                                • Exploring genetic neighbourhoods by perturbation of an ideal
                                                                                • Discussion
                                                                                  • On bond graphs
                                                                                  • On genetic programming
                                                                                      • Summary and conclusions
                                                                                        • Extensions and future work
                                                                                          • Additional experiments
                                                                                          • Software improvements
Page 8: Tools for Modelling and Identification with Bond Graphs and
Page 9: Tools for Modelling and Identification with Bond Graphs and
Page 10: Tools for Modelling and Identification with Bond Graphs and
Page 11: Tools for Modelling and Identification with Bond Graphs and
Page 12: Tools for Modelling and Identification with Bond Graphs and
Page 13: Tools for Modelling and Identification with Bond Graphs and
Page 14: Tools for Modelling and Identification with Bond Graphs and
Page 15: Tools for Modelling and Identification with Bond Graphs and
Page 16: Tools for Modelling and Identification with Bond Graphs and
Page 17: Tools for Modelling and Identification with Bond Graphs and
Page 18: Tools for Modelling and Identification with Bond Graphs and
Page 19: Tools for Modelling and Identification with Bond Graphs and
Page 20: Tools for Modelling and Identification with Bond Graphs and
Page 21: Tools for Modelling and Identification with Bond Graphs and
Page 22: Tools for Modelling and Identification with Bond Graphs and
Page 23: Tools for Modelling and Identification with Bond Graphs and
Page 24: Tools for Modelling and Identification with Bond Graphs and
Page 25: Tools for Modelling and Identification with Bond Graphs and
Page 26: Tools for Modelling and Identification with Bond Graphs and
Page 27: Tools for Modelling and Identification with Bond Graphs and
Page 28: Tools for Modelling and Identification with Bond Graphs and
Page 29: Tools for Modelling and Identification with Bond Graphs and
Page 30: Tools for Modelling and Identification with Bond Graphs and
Page 31: Tools for Modelling and Identification with Bond Graphs and
Page 32: Tools for Modelling and Identification with Bond Graphs and
Page 33: Tools for Modelling and Identification with Bond Graphs and
Page 34: Tools for Modelling and Identification with Bond Graphs and
Page 35: Tools for Modelling and Identification with Bond Graphs and
Page 36: Tools for Modelling and Identification with Bond Graphs and
Page 37: Tools for Modelling and Identification with Bond Graphs and
Page 38: Tools for Modelling and Identification with Bond Graphs and
Page 39: Tools for Modelling and Identification with Bond Graphs and
Page 40: Tools for Modelling and Identification with Bond Graphs and
Page 41: Tools for Modelling and Identification with Bond Graphs and
Page 42: Tools for Modelling and Identification with Bond Graphs and
Page 43: Tools for Modelling and Identification with Bond Graphs and
Page 44: Tools for Modelling and Identification with Bond Graphs and
Page 45: Tools for Modelling and Identification with Bond Graphs and
Page 46: Tools for Modelling and Identification with Bond Graphs and
Page 47: Tools for Modelling and Identification with Bond Graphs and
Page 48: Tools for Modelling and Identification with Bond Graphs and
Page 49: Tools for Modelling and Identification with Bond Graphs and
Page 50: Tools for Modelling and Identification with Bond Graphs and
Page 51: Tools for Modelling and Identification with Bond Graphs and
Page 52: Tools for Modelling and Identification with Bond Graphs and
Page 53: Tools for Modelling and Identification with Bond Graphs and
Page 54: Tools for Modelling and Identification with Bond Graphs and
Page 55: Tools for Modelling and Identification with Bond Graphs and
Page 56: Tools for Modelling and Identification with Bond Graphs and
Page 57: Tools for Modelling and Identification with Bond Graphs and
Page 58: Tools for Modelling and Identification with Bond Graphs and
Page 59: Tools for Modelling and Identification with Bond Graphs and
Page 60: Tools for Modelling and Identification with Bond Graphs and
Page 61: Tools for Modelling and Identification with Bond Graphs and
Page 62: Tools for Modelling and Identification with Bond Graphs and
Page 63: Tools for Modelling and Identification with Bond Graphs and
Page 64: Tools for Modelling and Identification with Bond Graphs and
Page 65: Tools for Modelling and Identification with Bond Graphs and
Page 66: Tools for Modelling and Identification with Bond Graphs and
Page 67: Tools for Modelling and Identification with Bond Graphs and
Page 68: Tools for Modelling and Identification with Bond Graphs and
Page 69: Tools for Modelling and Identification with Bond Graphs and
Page 70: Tools for Modelling and Identification with Bond Graphs and
Page 71: Tools for Modelling and Identification with Bond Graphs and
Page 72: Tools for Modelling and Identification with Bond Graphs and
Page 73: Tools for Modelling and Identification with Bond Graphs and
Page 74: Tools for Modelling and Identification with Bond Graphs and
Page 75: Tools for Modelling and Identification with Bond Graphs and
Page 76: Tools for Modelling and Identification with Bond Graphs and
Page 77: Tools for Modelling and Identification with Bond Graphs and
Page 78: Tools for Modelling and Identification with Bond Graphs and
Page 79: Tools for Modelling and Identification with Bond Graphs and
Page 80: Tools for Modelling and Identification with Bond Graphs and
Page 81: Tools for Modelling and Identification with Bond Graphs and
Page 82: Tools for Modelling and Identification with Bond Graphs and
Page 83: Tools for Modelling and Identification with Bond Graphs and
Page 84: Tools for Modelling and Identification with Bond Graphs and
Page 85: Tools for Modelling and Identification with Bond Graphs and
Page 86: Tools for Modelling and Identification with Bond Graphs and
Page 87: Tools for Modelling and Identification with Bond Graphs and
Page 88: Tools for Modelling and Identification with Bond Graphs and
Page 89: Tools for Modelling and Identification with Bond Graphs and
Page 90: Tools for Modelling and Identification with Bond Graphs and
Page 91: Tools for Modelling and Identification with Bond Graphs and
Page 92: Tools for Modelling and Identification with Bond Graphs and
Page 93: Tools for Modelling and Identification with Bond Graphs and
Page 94: Tools for Modelling and Identification with Bond Graphs and
Page 95: Tools for Modelling and Identification with Bond Graphs and
Page 96: Tools for Modelling and Identification with Bond Graphs and
Page 97: Tools for Modelling and Identification with Bond Graphs and
Page 98: Tools for Modelling and Identification with Bond Graphs and
Page 99: Tools for Modelling and Identification with Bond Graphs and
Page 100: Tools for Modelling and Identification with Bond Graphs and
Page 101: Tools for Modelling and Identification with Bond Graphs and
Page 102: Tools for Modelling and Identification with Bond Graphs and
Page 103: Tools for Modelling and Identification with Bond Graphs and
Page 104: Tools for Modelling and Identification with Bond Graphs and
Page 105: Tools for Modelling and Identification with Bond Graphs and
Page 106: Tools for Modelling and Identification with Bond Graphs and
Page 107: Tools for Modelling and Identification with Bond Graphs and
Page 108: Tools for Modelling and Identification with Bond Graphs and
Page 109: Tools for Modelling and Identification with Bond Graphs and
Page 110: Tools for Modelling and Identification with Bond Graphs and
Page 111: Tools for Modelling and Identification with Bond Graphs and
Page 112: Tools for Modelling and Identification with Bond Graphs and
Page 113: Tools for Modelling and Identification with Bond Graphs and
Page 114: Tools for Modelling and Identification with Bond Graphs and
Page 115: Tools for Modelling and Identification with Bond Graphs and
Page 116: Tools for Modelling and Identification with Bond Graphs and
Page 117: Tools for Modelling and Identification with Bond Graphs and
Page 118: Tools for Modelling and Identification with Bond Graphs and
Page 119: Tools for Modelling and Identification with Bond Graphs and
Page 120: Tools for Modelling and Identification with Bond Graphs and
Page 121: Tools for Modelling and Identification with Bond Graphs and
Page 122: Tools for Modelling and Identification with Bond Graphs and
Page 123: Tools for Modelling and Identification with Bond Graphs and
Page 124: Tools for Modelling and Identification with Bond Graphs and
Page 125: Tools for Modelling and Identification with Bond Graphs and
Page 126: Tools for Modelling and Identification with Bond Graphs and
Page 127: Tools for Modelling and Identification with Bond Graphs and
Page 128: Tools for Modelling and Identification with Bond Graphs and
Page 129: Tools for Modelling and Identification with Bond Graphs and
Page 130: Tools for Modelling and Identification with Bond Graphs and
Page 131: Tools for Modelling and Identification with Bond Graphs and
Page 132: Tools for Modelling and Identification with Bond Graphs and
Page 133: Tools for Modelling and Identification with Bond Graphs and
Page 134: Tools for Modelling and Identification with Bond Graphs and
Page 135: Tools for Modelling and Identification with Bond Graphs and
Page 136: Tools for Modelling and Identification with Bond Graphs and
Page 137: Tools for Modelling and Identification with Bond Graphs and
Page 138: Tools for Modelling and Identification with Bond Graphs and
Page 139: Tools for Modelling and Identification with Bond Graphs and
Page 140: Tools for Modelling and Identification with Bond Graphs and
Page 141: Tools for Modelling and Identification with Bond Graphs and
Page 142: Tools for Modelling and Identification with Bond Graphs and
Page 143: Tools for Modelling and Identification with Bond Graphs and
Page 144: Tools for Modelling and Identification with Bond Graphs and
Page 145: Tools for Modelling and Identification with Bond Graphs and
Page 146: Tools for Modelling and Identification with Bond Graphs and
Page 147: Tools for Modelling and Identification with Bond Graphs and
Page 148: Tools for Modelling and Identification with Bond Graphs and
Page 149: Tools for Modelling and Identification with Bond Graphs and
Page 150: Tools for Modelling and Identification with Bond Graphs and
Page 151: Tools for Modelling and Identification with Bond Graphs and
Page 152: Tools for Modelling and Identification with Bond Graphs and
Page 153: Tools for Modelling and Identification with Bond Graphs and
Page 154: Tools for Modelling and Identification with Bond Graphs and
Page 155: Tools for Modelling and Identification with Bond Graphs and
Page 156: Tools for Modelling and Identification with Bond Graphs and
Page 157: Tools for Modelling and Identification with Bond Graphs and
Page 158: Tools for Modelling and Identification with Bond Graphs and
Page 159: Tools for Modelling and Identification with Bond Graphs and
Page 160: Tools for Modelling and Identification with Bond Graphs and
Page 161: Tools for Modelling and Identification with Bond Graphs and
Page 162: Tools for Modelling and Identification with Bond Graphs and
Page 163: Tools for Modelling and Identification with Bond Graphs and
Page 164: Tools for Modelling and Identification with Bond Graphs and
Page 165: Tools for Modelling and Identification with Bond Graphs and
Page 166: Tools for Modelling and Identification with Bond Graphs and
Page 167: Tools for Modelling and Identification with Bond Graphs and
Page 168: Tools for Modelling and Identification with Bond Graphs and
Page 169: Tools for Modelling and Identification with Bond Graphs and
Page 170: Tools for Modelling and Identification with Bond Graphs and
Page 171: Tools for Modelling and Identification with Bond Graphs and
Page 172: Tools for Modelling and Identification with Bond Graphs and
Page 173: Tools for Modelling and Identification with Bond Graphs and
Page 174: Tools for Modelling and Identification with Bond Graphs and
Page 175: Tools for Modelling and Identification with Bond Graphs and
Page 176: Tools for Modelling and Identification with Bond Graphs and
Page 177: Tools for Modelling and Identification with Bond Graphs and
Page 178: Tools for Modelling and Identification with Bond Graphs and
Page 179: Tools for Modelling and Identification with Bond Graphs and
Page 180: Tools for Modelling and Identification with Bond Graphs and
Page 181: Tools for Modelling and Identification with Bond Graphs and
Page 182: Tools for Modelling and Identification with Bond Graphs and
Page 183: Tools for Modelling and Identification with Bond Graphs and
Page 184: Tools for Modelling and Identification with Bond Graphs and
Page 185: Tools for Modelling and Identification with Bond Graphs and
Page 186: Tools for Modelling and Identification with Bond Graphs and